Hacker News new | ask | show | jobs
by Teever 402 days ago
If common culture is an effective substrate to communicate ideas as in we can use shared pop culture references to make metaphors to explain complex ideas then the common culture that large companies have ensnared in excessively long copyrights and trademarks to generate massive profits is a useful thing for an LLM that is designed to convey ideas to have embedded in it.

If I'm learning about kinematics maybe it would be more effective to have comparisons to Superman flying faster than a speeding bullet and no amount of dry textbooks and academic papers will make up for the lack of such a comparison.

This is especially relevant when we're talking about science-fiction which has served as the inspiration for many of the leading edge technologies that we use including stuff like LLMs and AI.

1 comments

Fair point, we use metaphor to explain and understand a variety of topics, and a lot of those metaphors are best understood through pop culture analogies.

A reasonable compromise then is that you can train an AI on Wikipedia, more-or-less. An AI trained this way will have a robust understanding of Superman, enough that it can communicate through metaphor, but it won't have the training data necessary to create a ton of infringing content about Superman (well, it won't be able to create good infringing content anyway. It'll probably have access to a lot of plot summaries but nothing that would help it make a particularly interesting Superman comic or video).

To me it seems like encyclopedias use copyrighted pop culture in a way that constitutes fair use, and so training on them seems fine as long as they consent to it.