| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by donfuzius 1161 days ago

It's awesome that the OpenAssistant project made it this far with a lot of crowed-sourced input. Congrats to the whole team that works really hard trying to create a truly open LLM.

One thing that puzzles me though, is that for the GPT-3.5 comparison, the model used is trained using both OpenAssistant and alpaca data, which is not free due to the OpenAI license used to generate the data. Isn't that defeating the purpose?

"... Completions were generated using pythia-12b-deduped fine-tuned on the OpenAssistant and Alpaca [9] dataset as well as gpt-3.5-turbo using the OpenAI API..."

3 comments

gkbrk 1160 days ago

> due to the OpenAI license used to generate the data.

What makes you think OpenAI responses are copyrighted in any way?

link

QuadmasterXLII 1160 days ago

If openai owns openassistant because it was trained in part on chatgpt outputs, then andrew hussie owns chatgpt because it was trained in part on homestuck

link

jacooper 1160 days ago

link

donfuzius 1160 days ago

This is not about copyright but about the OpenAI terms of use that you agree to when you use ChatGPT or the API, which forbids using the output to build «competing models».

link

kristofferR 1160 days ago

Is rather think it's the opposite, it's almost definitely proven that it is not - it is obviously completely transformative.

link