Hacker News new | ask | show | jobs
by kiraaa 850 days ago
maybe they are using ring attention, on top of their 128k model.
1 comments

More likely some clever take on RAG. There’s no way that 1M context is all available at all times. More likely parts of it are retrievable on demand. Hence the retrieval-like use cases you see in the demos. The goal is to find a thing, not to find patterns at a distance
could be true, we can only speculate.