Hacker News new | ask | show | jobs
by vanuatu 17 days ago
all the labs "clean" their pretraining data, and you can have your pretraining data to be minimally ai generated but also spam synthetic post-training data