Hacker News new | ask | show | jobs
by csallen 630 days ago
> I don’t want companies profiting off of my posts

I despise this attitude. It's so entitled.

Our history of forever extending copyrights and protecting "intellectual property" has run amok, to the point where the average person thinks their scribbles, utterings, and ideas are valuable enough on their own to be worthy of a pay day. It's the culture of "My cut, my cut, my cut!"

Someone else profiting is not a tragedy to get up in arms about. The fact that you were somehow, tangentially, kinda sorta in the vicinity of that profit does not and should not mean you are owed money.

If you want to talk about privacy, sure, that's an issue worth bringing up. But, "I'm only mad because someone else made money and I didn't get paid," has nothing to do with privacy. It's pure greed, entitlement, and envy.

You know what? If you want to profit, do something to create value. Write a book. Start a paid newsletter. Create a startup. Put on a show and charge admission. Nobody is stopping you.

But if someone else figured out how to use a snippet of a comment you made 10 years ago as one-quadrillionth of the training data for a powerful LLM… if someone else figured out how to use your publicly-shared social media posts to attract advertisers to a platform they built… if someone else used 6 notes from a song you once sang to create a smash hit… kudos to them. They created something of value. You should've and could've done it yourself. Hell, you still can.

But nobody should owe you money. We should not have a society where people who actually create stuff are subject to endless friction and threats from do-nothings and patent trolls demanding "my cut" if the metadata from their words or actions contributes to 0.0001% of someone else's idea that they turned into profit with hard work.

3 comments

> You know what? If you want to profit, do something to create value. Write a book. Start a paid newsletter. Create a startup. Put on a show and charge admission. Nobody is stopping you.

I believe that the GP's complaint is that their content online is actually being scraped and turned into value for companies, they would want compensation for it.

I'm personally of two minds on this, posting public content online includes no guard rails for how its used. I also disagree strongly with LLM companies throwing mountains of resources at scraping the web though, if nothing else it feels very much like a monopolistic play leveraging massive power in those resources to create a competitive edge that other players couldn't compete with.

> I believe that the GP's complaint is that their content online is actually being scraped and turned into value for companies, they would want compensation for it.

And the comment directly addresses that. If someone creates a valuable thing and it has a minuscule pinch of your content inside it, you shouldn't be complaining or demanding payment. That's how participating in culture is supposed to work. When someone copies you orders of magnitude more directly, that's when you should be compensated or have control over it.

Since the web was widely scraped to train LLMs, I have to assume that the entirety of what I had up on the web was included. That's more than a "miniscule pinch". I consider it to be wholesale abuse. For me, money doesn't enter into it at all.

However, there's literally nothing I can do about it aside from withdrawing from the public web -- which is what I've done, aside from writing comments here. Until/unless there is some sort of effective way of defending against the crawlers, the open web is no longer a suitable place to publish anything.

There's never going to be a way to defend against crawlers and still have an open web. Good actors may respect conventions like a robots.txt file but that's ultimately just a polite request.

You could get further trying to block by user agent headers, known crawler IPs, etc but then you're just taking up the same fight advertisers have with ad blockers.

> You could get further trying to block by user agent headers

That's a game of what-a-mole, though, and when the scraped data is being used to train LLMs, then a single miss is a really huge problem. That's why I gave up on that approach and took my sites off of the open web until some effective defense becomes possible.

The complaints I see are almost always aimed at the output of an LLM, and that only contains a significant amount of a work when it breaks.

Going after the LLM itself, not the output, is a lot trickier. Anyone can make a big database of public website contents. And if they use it to make a search engine for example, that gets classified as entirely legitimate. If we're excluding the output of the LLM, what's the difference?

Also if you scrunch down into a small model, it mathematically can't contain very much of the input text.

> Going after the LLM itself, not the output, is a lot trickier.

Exactly so, and this is why withdrawing from the open web is the only realistic solution at this time.

That's a totally reasonable take, though it is just one opinion. I wouldn't tell someone they can't complain or feel entitled to payment for the value they created, though I bet we both agree that posting publicly online offers no expectation of payment by anyone coming across your content.
OP is not saying they want money. They say they don't want companies profiting from their work. The two are unrelated here.
Even worse. It's not that they want to make things better for themselves, but that they want to make things worse for others. Why?
I couldn't have put it better myself.

If you don't want others to use what you say to make money... Shut up.