Hacker News new | ask | show | jobs
by energy123 371 days ago
But that information is scattered. It's helpful for the LLM to cluster and isolate local reasoning that it can then "forget" about when it moves on to the next thing. Attending to nearby recent tokens is easy for it, looking up relevant information as needle in a haystack every single time is more error prone. I'm not saying asking it to remove comments will lead to a catastrophic drop off in performance, maybe something like a few percent or even less. Just that it's not useless for pure benchmaxxing.