|
|
|
|
|
by afpx
63 days ago
|
|
The article describes "the pile" as an "unfiltered scrape by design". But, the paper actually describes it as a bizarre mix of curated sources. https://arxiv.org/pdf/2101.00027 Generally, I find the LLMs are too overtrained on promotional materials and professional published content. |
|