Hacker News new | ask | show | jobs
by __alexs 76 days ago
This hasn't been the full story for years now. All SOTA models are strongly post-trained with reinforcement learning to improve performance on specific problems and interaction patterns.

The vast majority of this training data is generated synthetically.