Hacker News new | ask | show | jobs
by amrrs 505 days ago
This has been the problem with a lot of long context use cases. It's not just the model's support but also sufficient compute and inference time. This is exactly why I was excited for Mamba and now possibly Lightning attention.

Even though the new DCA based on which these models provide long context could be an interesting area to watch;