Hacker News new | ask | show | jobs
by jerf 320 days ago
This post gets the reason why people are cutting off LLMs exactly backwards and consequently completely fails to address the core issue. The whole reason people are blocking LLMs is precisely that they believe it kills the flow of readers to your content. The LLMs present your ideas and content, maybe with super-tiny attribution that nobody notices or uses [1], maybe with no attribution at all, and you get nothing. People are blocking LLMs with the precise intent of trying to preserve the flow to their content, be it commercially, reputationally, whatever.

[1]: https://www.pewresearch.org/short-reads/2025/07/22/google-us...

3 comments

Users don't seek content for the attribution; that's extra noise, unless there's reason to contact the attributed. And given that many websites offer an inefficient flow to content, made of ads and/or unnecessarily animated things for example, the LLM is merely improving the experience for the user.
"Why look at a sunset when you can read a summary about one from an AI?"

-Someone, somewhere, eventually

The whole conversation here is about incentives for the content creators, not the user.

Yes, as a user I'd like everything served to me on a silver platter, for free, on demand, and completely and 100% aligned with my interests exclusively with no thought given to anybody else... but that's not a realistic world. In the real world, if the content providers have no reason to provide content, they won't.

I kind of hate the connotations of "content provider", that neutral term that implies that it is all "content" that can just be measured in megabytes or something, but I mean the full richness here of the term, individual producers, small businesses, big business, everybody. Even my personal site, if I'm not getting something out of it, however intangible it may be, I wouldn't do it. I'd be mighty pissed if I lose a job someday because I get accused of just spewing out LLM content that the LLM can only spew out because of my own original ideas/formulation of ideas being on the internet.

There are many content creators out there who create and share with the only incentive being for fun and/or interest. However their content is in most cases down-ranked due to a lack of SEO, because no time to waste on that silliness when there are better things to do. The internet is merely reverting to its previous form where such strings-free content was the order of the day, instead of the highly SEO'd spam that's primarily aimed at getting ad impressions or whatever nonsense that degrades user experience.
Fair, if your content is your product, but I’m more than happy for every LLM on the planet to summarize my page and hype the virtues of my product to its user.
Enjoy the brief window of LLMs "hyping the virtues of your product" to its users for free. In 2030 that's not going to sound realistic at all. And I feel I'm being generous pushing it up to 2030, the first "sponsored training data" either already exists or will probably be out this year, the only question being whether it will be publicly admitted to or not.
No doubt, but that doesn't change the position.
Why do tech bros assume that every site is selling a product? There are blogs, personal web sites, communities, and open-source projects out there.
If there's no product, and it's free, why would one care about it appearing in the output of an LLM? If it's so secret that it shouldn't, then perhaps it should be behind some auth anyway.
Because writing is, in many senses, exposing yourself, and at the very least you want the recognition for it (even if only in the form of a visit to the website, and maybe the interactions that can follow)? Maybe you want at least the prestige that comes with writing good content that took a lot of time to create. Maybe because you want to participate in a community with your stuff. Maybe other million reasons.

I know that medium, substack and the other "publication" platforms (like LinkedIn) are trying to commodify even the act of writing into purely a form or marketing (either for a product, or for your personal brand), but not everyone gave up just yet.

Agreed, and we can argue semantics, but many folks would consider the content in that case a product.
Not everything everyone does is for a profit motive. I'm not trying to sell you anything ; myself included, when you visit my site. It's just reading material.
Something being a product does not require a profit motive.
Why would removing your content from LLM training data cause people to go and seek it out directly from you?

Would removing your website from google search results cause people to go directly to your website?

This seems like a weird comparison - Google’s explicit purpose is to direct people to your site. An LLMs purpose is not.
The point being made is that just as the search engine was the primary means for users to discover content yesterday, so the LLM agent will become the primary means tomorrow. And that content doesn't have to be in the training data, but if an agent is unable to access some particular content, then it won't be discovered by users. Similar to if a rewatch engine is unable to access it.