Hacker News new | ask | show | jobs
by citboin 633 days ago
>The main thing you can do is support companies and groups who are releasing open source models. They are usually using their own data.

Alternatively we could create standardized open source training data like wikipedia, wikimedia as well as public domain literature and open courseware. I'm sure that there are many other such free and legal sources of data.

1 comments

but the training data is one of the key bits that makes or breaks your model's performance.

There is a reason why datasets are private and the model weights aren't.