How could they publish the terabytes of training data? A million RAR files?
Honestly would that part even be useful? Like I want to know how they did the training so I can repro it with my own set of training data, right?
I mean, isn't that the future? Somebody figures out how to do P2P distributed training and groups can crawl the web training their own open source models?
Honestly would that part even be useful? Like I want to know how they did the training so I can repro it with my own set of training data, right?
I mean, isn't that the future? Somebody figures out how to do P2P distributed training and groups can crawl the web training their own open source models?