|
|
|
|
|
by belter
806 days ago
|
|
For all the Model Cards and License notices, I find it interesting there is not much information on the contents of the dataset used for training. Specifically, if it contains data subject to Copyright restrictions. Or did I miss that? |
|
I'd say the majority of instruct tunes, for instance, use OpenAI output (which is against their TOS).
But its all just research! So who cares! Or at least, that seems to be the mood.