Hacker News new | ask | show | jobs
by ProofHouse 236 days ago
wasn't this the attention sink concept to some degree? I mean it doesn't seem out of the realm of possibility that if the latency overhead isn't signifigant, that frontier models start adopting similar to DeepSeek OCR tech