Y
Hacker News
new
|
ask
|
show
|
jobs
user:
hynky
created:
2024-12-08
karma:
19
submissions:
0 points
|
0 comments
0 points
|
0 comments
FinePDFs: 3T token dataset made from internet PDFs
3 points
|
1 comments
0 points
|
0 comments
FineWeb2: Adapting Pre-Training Data Processing to Every Language
7 points
|
0 comments
FineWeb2 dataset: A sparkling update with 1000s of languages
2 points
|
0 comments