| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by forrest2 538 days ago

This is largely a side effect of mimicking the distribution on the internet via pretraining.

It's a good basis for setting up a model of the world since we have so much data and it's free.

Post-training techniques like DPO and RLHF are then about using minimal hand-curated data (expensive!) to shift that distribution closer to standard human / desired behavior.

It will continue to get better -- early versions of chat gpt were taught to say "I don't know" with something like 20 training examples and it got substantially better off of those. As that number of training examples increases with the amount of capital invested, there will be more patterns that get latched onto and expressed by attention in these models.

----

It will take time but they'll get pretty robust. Models will still be susceptible to Dunning-Kruger / ignorance. They aren't perfect AND it's in their training data thanks to us humans that they're copying.