Hacker News new | ask | show | jobs
by rpgwaiter 617 days ago
There’s many good reasons, but for me it’s that I don’t want companies profiting off of my posts that have no intention of profit sharing.

When I make a Youtube video and companies run ads on it, I get a piece of that pie (assuming I meet the requirements, etc.).

That same video fed into Gemini so google can charge for AI video generation? I get nothing, Google makes bank. As a user I can pay for YouTube premium and not see ads, but as a creator there’s no amount I can pay to not feed Gemini.

4 comments

>That same video fed into Gemini so google can charge for AI video generation? I get nothing, Google makes bank.

How do you feel about commenting on this site for free, which probably provides some benefit to ycombinator?

I would happily pay a reasonable monthly subscription to this site or similar. No problem paying for a service that treats users with respect. Also using this site has taught me all sorts of things that likely made me some money indirectly. It seems mutually beneficial without selling my data or paying for HN.

That said, if YC made a deal with some tech company to give them the firehose of data to train AI, I’d probably stop using HN. I stopped using reddit for a similar reason despite being a very frequent redditor with like 60k karma. I know it’s all pretty open and getting fed into many different LLMs anyways, but thats not necessarily YC’s fault.

My ideal would be strong government regulations regarding AI training, requiring explicit opt-in that isn’t buried in a ToS or EULA. Ideally companies would require a “non-AI feeding” version of their website to legally run in my country.

I can’t imagine a scenario where this happens in the current system, but I sure can fantasize.

Right, but are you objecting to AI training because companies are benefiting while you're being uncompensated, or think AI training is fundamentally bad? Your previous comment suggests it's the former, but by the same logic you shouldn't comment online either, because that also benefits the company and is uncompensated.
Users are here because they want to, they chose to participate in this community. You can stop using HN at any minute and Ycombinator will not chase you.

But AI companies on the other hand took the internet hostage. They stole any creative work, code, art, and literally any data they could their hands on with no regard to license or consent from users. No one actively opted-in to let AI companies have their personal data, they just silently grapped everything they could. Maybe there's some obscure website where you shared something private and lost access to the account or the website even went down? Congrats, it's now revived in OpenAI dataset where you've absolutely no control or details about how it's being used, not even a way to request and pursue legal action because the training data is a "secret".

It's not about compensation, it's that fact you have no option or say even if you don't use their services.

You can't escape or opt-out, unless you go off the grid, and even then they still retain your old data and use it as they see fit.

They already said that they believe commenting here has been mutually beneficial but anyway this is a false dichothomy, one could be neutral on AI in general but feel negative towards training proprietary privacy-invasive AI models that will for sure be used to make their content creation less relevant.
I get the satisfaction of telling off strangers
I'm pretty sure for antitrust reasons Google is (ironically) about the only AI company who is not training generative AI on youtube content. So when you make a youtube video and people train AI on your video and generate stuff using it you only get 1 click of benefit out of it from their first download.
> for me it’s that I don’t want companies profiting off of my posts that have no intention of profit sharing

I won't argue with your position, but most people would be hurting themselves more than they would hurt the companies, even in absolute terms.

> I don’t want companies profiting off of my posts

I despise this attitude. It's so entitled.

Our history of forever extending copyrights and protecting "intellectual property" has run amok, to the point where the average person thinks their scribbles, utterings, and ideas are valuable enough on their own to be worthy of a pay day. It's the culture of "My cut, my cut, my cut!"

Someone else profiting is not a tragedy to get up in arms about. The fact that you were somehow, tangentially, kinda sorta in the vicinity of that profit does not and should not mean you are owed money.

If you want to talk about privacy, sure, that's an issue worth bringing up. But, "I'm only mad because someone else made money and I didn't get paid," has nothing to do with privacy. It's pure greed, entitlement, and envy.

You know what? If you want to profit, do something to create value. Write a book. Start a paid newsletter. Create a startup. Put on a show and charge admission. Nobody is stopping you.

But if someone else figured out how to use a snippet of a comment you made 10 years ago as one-quadrillionth of the training data for a powerful LLM… if someone else figured out how to use your publicly-shared social media posts to attract advertisers to a platform they built… if someone else used 6 notes from a song you once sang to create a smash hit… kudos to them. They created something of value. You should've and could've done it yourself. Hell, you still can.

But nobody should owe you money. We should not have a society where people who actually create stuff are subject to endless friction and threats from do-nothings and patent trolls demanding "my cut" if the metadata from their words or actions contributes to 0.0001% of someone else's idea that they turned into profit with hard work.

> You know what? If you want to profit, do something to create value. Write a book. Start a paid newsletter. Create a startup. Put on a show and charge admission. Nobody is stopping you.

I believe that the GP's complaint is that their content online is actually being scraped and turned into value for companies, they would want compensation for it.

I'm personally of two minds on this, posting public content online includes no guard rails for how its used. I also disagree strongly with LLM companies throwing mountains of resources at scraping the web though, if nothing else it feels very much like a monopolistic play leveraging massive power in those resources to create a competitive edge that other players couldn't compete with.

> I believe that the GP's complaint is that their content online is actually being scraped and turned into value for companies, they would want compensation for it.

And the comment directly addresses that. If someone creates a valuable thing and it has a minuscule pinch of your content inside it, you shouldn't be complaining or demanding payment. That's how participating in culture is supposed to work. When someone copies you orders of magnitude more directly, that's when you should be compensated or have control over it.

Since the web was widely scraped to train LLMs, I have to assume that the entirety of what I had up on the web was included. That's more than a "miniscule pinch". I consider it to be wholesale abuse. For me, money doesn't enter into it at all.

However, there's literally nothing I can do about it aside from withdrawing from the public web -- which is what I've done, aside from writing comments here. Until/unless there is some sort of effective way of defending against the crawlers, the open web is no longer a suitable place to publish anything.

There's never going to be a way to defend against crawlers and still have an open web. Good actors may respect conventions like a robots.txt file but that's ultimately just a polite request.

You could get further trying to block by user agent headers, known crawler IPs, etc but then you're just taking up the same fight advertisers have with ad blockers.

> You could get further trying to block by user agent headers

That's a game of what-a-mole, though, and when the scraped data is being used to train LLMs, then a single miss is a really huge problem. That's why I gave up on that approach and took my sites off of the open web until some effective defense becomes possible.

The complaints I see are almost always aimed at the output of an LLM, and that only contains a significant amount of a work when it breaks.

Going after the LLM itself, not the output, is a lot trickier. Anyone can make a big database of public website contents. And if they use it to make a search engine for example, that gets classified as entirely legitimate. If we're excluding the output of the LLM, what's the difference?

Also if you scrunch down into a small model, it mathematically can't contain very much of the input text.

> Going after the LLM itself, not the output, is a lot trickier.

Exactly so, and this is why withdrawing from the open web is the only realistic solution at this time.

That's a totally reasonable take, though it is just one opinion. I wouldn't tell someone they can't complain or feel entitled to payment for the value they created, though I bet we both agree that posting publicly online offers no expectation of payment by anyone coming across your content.
OP is not saying they want money. They say they don't want companies profiting from their work. The two are unrelated here.
Even worse. It's not that they want to make things better for themselves, but that they want to make things worse for others. Why?
I couldn't have put it better myself.

If you don't want others to use what you say to make money... Shut up.