Hacker News new | ask | show | jobs
by Nav_Panel 236 days ago
Love it, they're teaching LLMs how to skim texts properly, which is exactly the right approach for handling long contexts.
1 comments

wasn't this the attention sink concept to some degree? I mean it doesn't seem out of the realm of possibility that if the latency overhead isn't signifigant, that frontier models start adopting similar to DeepSeek OCR tech