Y
Hacker News
new
|
ask
|
show
|
jobs
by
a2128
438 days ago
You don't even need to deal with any XML formats or anything, they publish a complete dataset on Huggingface that's just a few lines to load in your Python training script
https://huggingface.co/datasets/wikimedia/wikipedia