Hacker News new | ask | show | jobs
by pk-protect-ai 849 days ago
> But luckily we got LWM, and that means we have only been 10xed.

If I remember the paper correctly, it was something about a 4M context in there. So not 10x, but 2.5x.

> What is really insane is that, we have had LLAMA2 for over a year now, and nobody else figured out how to get this result from it, despite it being around so long.

This isn't true. For now, the task of extending context to 10M tokens is brute-forced by money (increased HW requirements for training and inference and increased training time are also a financial domain). And for now, there simply is no leapfrogging solution for open source or commercial models, which will decrease the costs by orders of magnitude.