Hacker News new | ask | show | jobs
Nvidia open source LLM Nemotron 4 340B at top of the charts [pdf] (d1qx31qr3h6wln.cloudfront.net)
17 points by moondistance 731 days ago
1 comments

>We use synthetic data heavily to create Nemotron-4-340B-Instruct: over 98% of our training data has been synthetically generated throughout our alignment process.

Very interesting to see synthetic data used so heavily during alignment.

Are there any known models that make heavy use of synthetic data during pretraining?