Hacker News new | ask | show | jobs
by usernametaken29 36 days ago
While 100 million tokens sounds a lot, think about it for a bit, and you’ll see why it is basically nothing. Try to cram a human lifetime of sounds, smells, video and more sensory data into 100 million tokens. Heck, try to process the video plot of a single series into that window. It just won’t work, it won’t scale, and is laughable compared to contextual memory. I’m not saying that to belittle the authors of the paper but the reality is that this has very little to do with transient long term memory.
2 comments

You don't remember a lifetime of smells. You don't have any memories from huge swaths of time. There are entire years of your life compressed down to vibes and a handful of events you largely misremember.
That’s a very weak argument. Memories are not exact replica of experiences. We know that many memories are retained through a lifetime, particularly the ones from early childhood. Unlike computers we always reconstruct memories from several modalities. Even if we remember largely on vibes as you say (which is not true when you look into neuroscience), the sheer amount of information is overwhelming. Again, try to run a 90 minute movie through an LLM memory system. It won’t be able to tell you the plot. That’s before you even feed it sound. Even 100M tokens is not enough for that. You on the other hand will largely remember the movies you liked and their major plot lines and from there be able to reconstruct its scenes. I think the engineers working on memory vastly underestimate the capacity problem of discrete states.
blah blah we know that blah neuroscience blah blah blah.

This isn't an argument you are making, it's just an assertion that you could make an argument if you are so inclined, but you won't be doing so at this time, but "science" is obviously on your side, but you can't be bothered to say how or even enough detail for someone to check what you are referring to. I can do that to, see my first sentence in this reply.

I don't know how LLM memory systems work. I do know that you don't have a lifetime of remembering everything with high precision. Not only do most people not remember the plot of most of the movies they have seen, they can't reliably list most of the movies they have seen. Not everyone has a good memory. My point is that it's not valid to reference a false model of how human memory works as a reason some specific LLM memory implementation isn't useful for solving some problems.

Exactly, and for a given task you don't need to recall what your friend's brother's name is to do a git commit and push. There's a pull for more context to make these things better, but also the pull to make these execute in such a small context effectively when appropriate.

I'm more on team small tasks because of my love of unix piping, I keep telling folks, as a old Linux dude, seeing subagents work together for the first time felt like I was learning to pipe sed and awk for the first time. I realized how powerful these could be, and we still seem to be going that direction.

I think you underestimate just how much information 100M words-ish of information is. It's like a 300,000 page novel. That's a 50 foot (~15 meter) thick book.

Surely with (much less than) 300K pages you could describe every meaningful detail of a video series' plot. You don't need to remember the exact pixel values.

You can also scale the numbers up. I specifically chose a relatively small model and short context length as a reference, so 100x bigger is not out of question. At that point, with a 10B token capacity, you are looking at all of English Wikipedia in a single state.