Hacker News new | ask | show | jobs
The Pile: An 800GB Dataset of Diverse Text for Language Modeling [pdf] (pile.eleuther.ai)
1 points by nixtaken 1983 days ago
1 comments