Hacker News new | ask | show | jobs
RedPajama-Data-v2: 30T tokens filtered and de-duplicated (twitter.com)
4 points by leumassuehtam 968 days ago