Hacker News new | ask | show | jobs
by visarga 980 days ago
Just reading a couple papers every day, the most interesting ones, and following up on reddit and twitter to get notified what people are talking about. And I am directly interested in long-context LLMs for a work related task.

I have also been dabbling with neural nets (pre-transformer), especially LSTM which have a "residual" connection, the one I was mentioning. That makes gradients better behaved. Schmidhuber tech.