| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bredren 419 days ago
	Do we have estimates on the corpus that is available? This model's repo describes "multiple strategies to generate massive diverse synthetic reasoning data." FWIW, AI 2027 forecasts heavy emphasis on synthetic data creation. Is the lack of existing corpus just an extra hurdle for Hanzi-first models that are also leading the pack in benchmarks?