| HN Mirror

How could they publish the terabytes of training data? A million RAR files?

Honestly would that part even be useful? Like I want to know how they did the training so I can repro it with my own set of training data, right?

I mean, isn't that the future? Somebody figures out how to do P2P distributed training and groups can crawl the web training their own open source models?