| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by veunes 158 days ago
	If just 16 million examples were enough to significantly boost model quality (as Anthropic claims), it turns out that data quality beats quantity Instead of vacuuming petabytes of trash from Common Crawl, you can just take high-quality distillate from a SOTA model and get comparable results. Bad news for anyone betting solely on massive compute clusters and closed datasets