Hacker News new | ask | show | jobs
by pwendell 1181 days ago
Yes it's nuanced, but will be simplified going forward.

This uses a fully open source (liberally licensed) model and we also open sourced (liberally licensed) our own training code. However, the uptraining dataset of ~50,000 samples was generated with OpenAI's text-davinci-003 model, and depending on how one interprets their terms, commercial use of the resulting model may violate the OpenAI terms of use. For that reason we are advising only noncommercial use of this model for now.

The next step here is to create a set of uptraining samples that is 100% open. Stay tuned.

1 comments

Are you in touch with the OpenAssistant team? I believe they already have a more or less complete set of samples (100,000!) that were produced in an open environment and aren't encumbered by any licensing.
No I haven't heard of that, we'll engage with that team. This is exactly what we need will look into it.