Hacker News new | ask | show | jobs
by josephg 1034 days ago
And I’ll add some more:

- LLMs aren’t very good at large scale narrative construction. They get too distracted by low level details that they miss the high level details in long text. It feels like the same problem as stable diffusion giving people too many fingers.

- LLMs have 2 kinds of memory: current activations (context) and trained weights. This is like working memory and long term memory. How do we add short term memory? Like, if I read a function, I summarize it in my head and then remember the summary for as long as it’s relevant. (Maybe 20 minutes or something). How do we build a mechanism that can do this?

- How do we do gradient descent on the model architecture itself during training?

- Humans have lots more tricks to use when reading large, complex text - like re-reading relevant sections, making notes, thinking quietly, and so on. Can we introduce these thinking modalities into our systems? I bet they’d behave smarter if they could do this stuff.

- How do we combine multiple LLMs into a smarter overall system? Eg, does it make sense to build committees of “experts” (LLMs taking on different expert roles) to help in decision making? Can we get more intelligence out of chatgpt by using it in a different way in a larger system?