Hacker News new | ask | show | jobs
by exe34 504 days ago
> they are by definition the average of the internet.

Are you referring to base models?

Nowadays they also train on stolen books and are further "aligned" based on feedback. I imagine they are already learning to teach based on feedback from users.

1 comments

To be honest, I was using internet as shorthand for average of human knowledge, on the basis that most books, peer reviewed articles, and everything else is already on the internet, even if they exist only in the more unsavory corners (I've seen nothing to suggest the FM producers were / are much bothered about where the data is from).

But yes referring to base models. I'm also not convinced that the average book is any more trustworthy than the average webpage, whether that be a purely technical book, where you really need to webpage of errata to be able to use the examples. Or the more pop-sci books that cherry pick data and jump to completely unfounded conclusions (I'm thinking of the ancient engineers - aliens built the pyramids books).

The feedback is great and might work in some areas, technical knowledge. But once you step outside of the physical sciences and engineering, you don't so much end up with better quality information, just a curated experience that aligns with the model owners (think DeepSeek and Tiananmen square)

the nice thing about books, especially STEM ones, is that you can tell if there's a problem because there will be inconsistencies. so even without the errata, you can fuzz until it all matches.