Hacker News new | ask | show | jobs
by sheepdestroyer 501 days ago
They could easily list the data used though. These datasets are mostly known and floating around. When they are constructed, instructions for replication could be provided too
1 comments

They could, but even if they give this list the detractors will still say it is not open source.
yes and as a bonus they may get sued, which in the long-term, makes free / offline models to not be viable

It would be so much better if all models were trained with LibGen.