Hacker News new | ask | show | jobs
by afpx 63 days ago
The article describes "the pile" as an "unfiltered scrape by design". But, the paper actually describes it as a bizarre mix of curated sources. https://arxiv.org/pdf/2101.00027

Generally, I find the LLMs are too overtrained on promotional materials and professional published content.