Hacker News new | ask | show | jobs
by _petronius 604 days ago
I think this is a bad-faith argument, if you know anything at all about how machine-learning systems work (in general, and LLMs in particular), and I wish people would stop trotting it out.

First, there are repeated documented examples of prompts beign designed that can cause an LLM to output training data (first link from a quick google: https://www.darkreading.com/cyber-risk/researchers-simple-te... but there are other examples, and some well-discussed ones on this site involving Github's own Copilot).

Second, the "it's just like a person learning" argument has been applied to all sorts of machine learning, and it rests on several fallacies:

1. That these systems learn the way humans do, and innovate on that learning

2. That their output constitutes any kind of original thought (related, LLM output is not copyrightable; human output is)

3. Most importantly, the scale is totally different. I think we can agree on the trivial example that training an image generation LLM on an artist's style and using that at scale to undercut the market for their work would constitute a kind of technologically-enabled competition that normal humans learning and copying styles could not equal, either in speed or in cost -- even if the quality were orders of magnitude better.