Hacker News new | ask | show | jobs
Colossal Clean Crawled Corpus (C4): Open-Source NLP Pretraining Corpus by Google (tensorflow.org)
4 points by Riccardo_G 2310 days ago