Y
Hacker News
new
|
ask
|
show
|
jobs
by
Nav_Panel
236 days ago
Love it, they're teaching LLMs how to skim texts properly, which is exactly the right approach for handling long contexts.
1 comments
ProofHouse
236 days ago
wasn't this the attention sink concept to some degree? I mean it doesn't seem out of the realm of possibility that if the latency overhead isn't signifigant, that frontier models start adopting similar to DeepSeek OCR tech
link