Hacker News new | ask | show | jobs
by alansaber 115 days ago
True but the fundamental architecture tends not to be radically different, it's more about the training/RL regime
1 comments

But the point is that to even start to claim that a limitation holds for all LLMs you can't use empirical results that have been demonstrated only for a few old models. You either have a theoretical proof, or you have empirical results that hold for all existing models, including the latest ones.