Hacker News new | ask | show | jobs
by soerxpso 23 days ago
> It’s like if I go to Golden Gate Park and pick one flower, I shouldn’t do that, but no one cares. But if I build a machine to automatically cut every flower in the park because I want to sell them, that’s different.

It's not like that, because flowers are a physical object and moving them to one place deprives their original location of the flowers. When an LLM learns something from a webpage, the webpage is still there. Whatever 'theft' you perceive is entirely in your head; you were deprived of nothing by someone else making a copy of your thing.

7 comments

This is not true. Because the copy is a devaluation of the original, so even though the web page is still there it’s value has decreased.
"It's not like that"

That's not the point. The point is that scale matters, and that was the only point.

> Whatever 'theft' you perceive is entirely in your head

Rather, it appears to be in your head, since the person you’re replying to has not mentioned or even hinted at theft. The problem with taking all flowers from a public park for your own profit is multifaceted. Amongst others, you’re depriving everyone else from enjoying them, but also degrading the image of the park and harming all the insects which depend on those flowers and the birds who depend on those insects, which in turn degrades the park further, which stops people from enjoying it and going there and caring for it. It’s not about a single physical object, it’s about the ripple effect the selfish action produces.

It's not like that, because flowers are a physical object and moving them to one place deprives their original location of the flowers. When an LLM learns something from a webpage, the webpage is still there. Whatever 'theft' I perceive is entirely in my head; I was deprived of nothing by someone else making a copy of my thing.
I get that the intention here is to plagiarize and thus cause the parent to feel the harm of it and realize the error in their ways, but I don't think it works. Plagiarism's harm to the plagiaree (?) is that it robs them of credit and payment, but nobody is viewing your reply in isolation of the parent's attribution and parent wasn't expecting to make money off of an HN comment. The harm to the rest of society where you gain false esteem for another's work is also not carried out in this instance. The harm to the plagiarizer where they fail to learn because they copied instead is likewise absent. If someone were to feel harm just from a copy of their words existing, they wouldn't need you to do it- google has hastily indexed this along with every other HN comment and we all know that this whole thread will make its way into LLM training sets eventually.
> google has hastily indexed this

Google doesn't claim authorship over that which they index.

Plagiarism doesn't need to be harmful for it to be bad, and my intent wasn't to harm anyone anyway. My intent was that I could use the authors exact words to pretend to make a unique take that I claimed to have authored.

I don't understand. In what way is plagiarism bad if it doesn't harm? If it were harmless to pretend you authored a unique take, how is the parent expected to react to you not harming them such that they realize it's bad?
Harmless doesn't imply ethical. Plagiarism that doesn't harm is still lying.
Fair enough, shame on me for assuming utilitarianism.
Can you apply your philosophy to the U.S. dollar ? I am sure producing copies is a "theft" that is entirely in your head. You were deprived of nothing by someone else making a copy of your dollar.
But you're still depriving the world of future flowers. Why spend years studying, sacrificing time with others, living frugally if others can take or monetize the result for free? Most people need compensation to justify their effort. Or the option to not have their years of work/sacrifice co-opted into an ai generated ad for toilet bowl cleaner.

No cost copying doesn't remove the need for compensation to sustain ongoing creation. Society has long treated knowledge, art, and thought as high-value outputs, and accepted the copyright tradeoff to support them. That is long settled and no 'get rid of copyright' proponents argue satisfactorily why the 300 year corpus of thought on that is invalid. Long copyright terms may justify reform but not rejection of the establishment that creative work needs economic value to sustain ongoing creation, and that ongoing creation is a net positive/desirable for society.

You are free to release copyright free today. In software that has unlocked immense value. In other areas those choosing copyright have unlocked more value. But software is different, I can get hired to build on the free. No one is hiring an author to expand their book to include fanfiction. And were that the model, it would arguably result in worse results as we are now back to the much worse patronage system where Bob hordes what he's paid for and only shares it with friends for status. For 300 years we've understood because of dynamics paywalled copyright with a throttled side of libraries unlocks the greatest access to knowledge. Eliminating duplication cost has not changed that.

'but I want every flower there is today and I don't care if there are any future flowers' doesn't change that, it's simply a new value judgement that my want/use case today outweighs the cost to society of lost future knowledge creation/return to a patronage based reward system. Again 300 years of thought say that results in a worse outcome for society. How does the typical OSS project that depends on patronage fare? Do we really want to return all knowledge output to that model?

When the LLM presents what it learned as its own thoughts without any attribution, that's the theft.

And you understand that. You're not stupid. This is the thing: AI is convenient for corporations, so you'll make dishonest arguments to justify your unethical behavior. Maybe you even believe what you say, but that's because people will hold on to any flimsy thing that lets them feel like they're good people, not because the reasoning actually makes any sense.

This is why people talking about AI get booed at speeches. There's no conversation to be had: you're not interested in the truth, or what's right, or what's good for anyone but yourself.