Hacker News new | ask | show | jobs
by K0balt 27 days ago
Meh, while I’d agree that LLMs are idiot savants more than geniuses, I think you underestimate the general quality of training data. First, it’s all on data that was published or written. People below 80 is don’t publish or write at all, and when they do you can filter it with a regex. So already you skew the curve up 15 points or so. Then, factor in that published usually means 120+ and also includes the collective treasures of civilization. Even the average joes are going to skew towards things they are knowledgeable and passionate about, putting their best foot forward and so on. ( and the trolls get regexed to oblivion). Only the very clever trolls get through, and at least they pattern match for clever.