Hacker News new | ask | show | jobs
by everforward 906 days ago
> Now imagine LLMs trained on early 20th century newspapers, books and letters. Do you think it would be good at generating code or hip copy for homepage of your next startup?

Not sure about the rest of the world, but at least for US content I don't think any company would publish that LLM.

That's like 40 years before the civil rights movement, and right about the time of the Tulsa massacre.

It's right around when women got the right to vote.

Trying to get it to not say anything horrible under modern standards seems fraught with issues. I don't know if it would even understand something like "don't be racist", given the context it was trained on.

1 comments

Exactly. Copyright terms are so long that most material with expired copyright is not useful for modern uses of LLMs and looking for modern non-copyrighted materials is too hard to do quickly and its usefulness is unclear. So people who grew up with Internet and are used to making memes with copyrighted material are not exactly averse to do it on a bigger scale.