Hacker News new | ask | show | jobs
by sbassi 245 days ago
Which data uses for training?
2 comments

I think he mentioned somewhere he used fineweb (I assume this one https://huggingface.co/datasets/HuggingFaceFW/fineweb)