|
|
|
|
|
by pointlessone
906 days ago
|
|
I think your question is incorrect. It’s very likely no-one thinks it’s perfectly legal. There probably are many people who think it’s not a big deal, though. Try coming up with a dataset that doesn’t have any copyrighted material in them. Like seriously try. You can’t use pretty much anything newer than a century old. Everything is copyrighted by default. Very few new things are explicitly in public domain or licensed in a way that would allow usage. Now imagine LLMs trained on early 20th century newspapers, books and letters. Do you think it would be good at generating code or hip copy for homepage of your next startup? |
|
Not sure about the rest of the world, but at least for US content I don't think any company would publish that LLM.
That's like 40 years before the civil rights movement, and right about the time of the Tulsa massacre.
It's right around when women got the right to vote.
Trying to get it to not say anything horrible under modern standards seems fraught with issues. I don't know if it would even understand something like "don't be racist", given the context it was trained on.