Hacker News new | ask | show | jobs
by kelipso 499 days ago
It’s a common complaint on open sourced ML models that they don’t provide or describe the data used to train the model. Sometimes it’s a valid complaint, since it may not be clear what kind of data was used to train the model, and sometimes it’s not since it’s clear.

I think it’s kind of an overdone complaint and I usually ignore it, and besides it looks like there’s a huggingface project ongoing where they’re trying to replicate the training process for this model anyway.