Hacker News new | ask | show | jobs
user: hynky
created: 2024-12-08
karma: 19

submissions:

0 points | 0 comments
0 points | 0 comments
FinePDFs: 3T token dataset made from internet PDFs
3 points | 1 comments
0 points | 0 comments
FineWeb2: Adapting Pre-Training Data Processing to Every Language
7 points | 0 comments
FineWeb2 dataset: A sparkling update with 1000s of languages
2 points | 0 comments