Hacker News new | ask | show | jobs
by krackers 78 days ago
Yeah the ERD result was known before I think, see "The Remarkable Robustness of LLMs: Stages of Inference?" https://arxiv.org/abs/2406.19384

But the fact that the intermediary circuits are generic and robust enough that you can just loop them is unexpected. I mean maybe it sort of makes sense in retrospect, the above and other papers showed the middle layers of an LLM behave more like "iterative refinement", so to use a signal processing analogy maybe you just keep applying filters and suppress the noise.

But by that same analogy, I'd predict that you can't just keep repeating layers, at some point you'll suppress the signal as well. Not sure if there was an experiment conducted with how many times you can repeat RYS layers before performance goes back down.