Hacker News new | ask | show | jobs
by Max-Ganz-II 22 days ago
To stop this, I a month or two ago put most of my Amazon Redshift research web-site behind a basic auth username/password wall.

It all remains free, but you need to email me for a username and password.

If I put in time and effort to make content and OpenAI et al copy it and sell it through their LLM such that no one comes to me any more, then plainly it makes no sense for me to create that content; and then it would not exist for OpenAI to take, or for anyone else to read. We all lose.

It seems parasitic, and on the face of it, acting to kill the host.

In fact, it essentially seems like abrogation of the concept of private property.

The AI companies can take what I make, without my consent, which they sell for profit, where that profit it seems to be was formerly substantially coming to me, the return on my efforts.

I had a look for ways to indicate to AI companies to remove my content. The methods provide are a fig leaf and put the onus on me, and in any event in a way which can never be known to have removed my content - "if you can show your content from a prompt, we will take steps to try to prevent that content from showing".

As a consequence of putting up a username/password wall, Google has profoundly de-ranked the site, and I believe it is basically not being found on search any more.

6 comments

True. It's a massive shift of power, all being centralized.

As you mentioned, they know they need good data though, so they might actually try to find some equilibrium.

If not, it's possible that the creation of new valuable content, to feed the LLMs, will be produced in-house by the AI labs. Sounds insane, but Netflix also makes their own content.

I think the AI labs will become so big that they'll take on more roles than just offering LLM inference. I think they'll become as or more powerful than many current nation state governments.

> True. It's a massive shift of power, all being centralized.

It may seem that way in the short term. But in the long term, the tendency in technical development is for the infrastructure and capital requirements for new technology to start off very high, but then shrink over time, such that use cases that required massive amounts of upfront investment in the early stages become incrementally more viable at smaller and smaller scales.

People were saying the same things about computers in general in the 1960s as they are about GenAI now. That was an era when computer technology itself had developed to a point when it was economically impactful, but still only affordable to large institutions. People making predictions that increasing use of computers would lead to massive centralization of economic and cultural power didn't predict that merely twenty years later, computing power equivalent to contemporary mainframes would be available in a convenient desktop box available at a local shop to any individual or main-street business that cared to buy one.

The widespread availability of computing technology from the '80s to present actually had the opposite effect, and led to quite a bit of decentralization, as enterprising individuals and new startups started applying that technology to do at small scales what only large enterprises could do before. In fact, a lot of the reaction to AI in its current stage may actually be because it's disrupting the expectation of decentralization and autonomy over our technology that the personal computing revolution established in the first place.

Like most new technologies, GenAI in its initial stages has required massive infrastructure investments that have led to the early iterations being offered by centralized institutions, but that might not last. Open-source AI models are approaching the capabilities that the big players' frontier models had arrived at only a couple of years ago.

In 2026, we're already at the point where local inference is economically viable for commercial use cases -- at my own company, while we do use our Claude Enterprise account for a variety of use cases, it turned out to make much more sense in terms of both cost and risk exposure to instead process certain datasets (e.g. a large volume of phone recordings containing PII) with local models running on commodity GPUs. That proved to be entirely effective, and the one-time hardware investment (which created a bookable asset for the company) turned out to be less than the cost of running the same task on Claude (which would have been pure OpEx).

Your positive view makes sense to me and is refreshing. Let's see how things play out. So the pattern will be: that which can be done with smaller models will be decentralized first; gradually the more advanced stuff will become within reach. I already do use google search's AI Mode (probably a 300B model) for many quick questions. Local models would be great for things like checking my email and many other things (sensitive + continuously running = not suitable for Opus). My 64GB DDR4 laptop can already run Qwen 47B at .7 tok/s, that's already usable for some usecases (overnight stuff mostly).
> It all remains free, but you need to email me for a username and password.

How will you be sure that it's humans emailing you?

The authentic spelling and grammar errors :-)
I have seen many recipe websites do the same recently. All the big sites require an account too now.
Do they still have the long winded story about the author's grandfather's apple tree which gave only sour fruits?
That was for SEO/copyright reasons, so I guess not?

This mostly feels like a meme though. Most of the recipes I see have instructions, notes and photos, then a recipe. It's unfortunate that people think of the worst offender and cheer for the death of the independent web.

You must know which sites to go to. As a casual, I encounter the SEO spam every single time.
Welcome to the dark forest.
Sad but good. The internet had become too large any way. Once upon a time it was a small pond full of interesting life, today it's an unfathomably large ocean. Everywhere you look, it's mostly empty.

The good stuff now is in small private communities, away from the bots and the eternal influx of september newbies to devolve every discussion into memes. Elitism will be good again.

> companies can take what I make, without my consent

Welcome to "democracy". Of course, _we_ decide what "democracy" is and how (and if) we apply it in your unfortunate, individual case.

The multinational companies are a fair bit outside democratic control. If one country bans or restricts them they can operate from some other one.
But in the present moment, it seems like countries are themselves even more outside democratic control than multinational companies are.

The mechanisms of democratic accountability in political institutions are today are demonstrably dysfunctional and broken, if they ever really worked at all, whereas multinational companies are at least somewhat beholden to market pressures. Sure, they can engage in jurisdiction shopping when that's viable for them, but it's more often the case that they seek to influence those very governments in order to insulate themselves from accountability to the market.

And many of the bans and restrictions that firms try to avoid by switching jurisdictions are themselves the result of some other industry or special interest group managing to exert stronger influence over the local political institutions, and not due to any sort of "democratic" consensus. Look at the recent cluster of nearly identical age-verification laws passed by jurisdictions around the world, which there was near zero "democratic" impetus for, as an example of this.

Dunno. In democracies you can still vote the bastards out, even if the process is imperfect.

The age verification thing was initially a UK project and people there broadly support it https://yougov.com/en-gb/articles/54405-eight-months-on-thre...

the others copied

> To stop this, I a month or two ago put most of my Amazon Redshift research web-site behind a basic auth username/password wall.

> It all remains free, but you need to email me for a username and password.

That also creates friction for new users, discoverability issues, and additional privacy concerns for people wanting to access your content.

> I had a look for ways to indicate to AI companies to remove my content.

Even the ones that do provide attribution and links back to the original source? Perplexity does a good job of that, for example.

> As a consequence of putting up a username/password wall, Google has profoundly de-ranked the site, and I believe it is basically not being found on search any more.

Well, yeah, if you're blocking the content from being accessed without a login, you're blocking it from being indexed by search engines.

I guess I'm a little confused as to what your ultimate goal is. If you're putting content up on the web for free, what are you gaining by blocking AI from indexing it, especially when you're blocking actual users, whether they discover it via AI or traditional search?

I understand the frustration at seeing AI tools digest your content and then repeat to users without connecting it back to your site. But that's something that other people have always been doing independently of AI -- people read articles, learn facts or understand new ideas from them, and then incorporate them into their general assumptions to be expressed in their own work without necessarily acknowledging, or even recalling, where the underlying information that informed their thinking came from. People have been writing articles and producing various forms of media content that are inspired by other people's unattributed work since time immemorial.

Yes, AI accelerates that process and makes it more visible to you, so I understand where the frustration is coming from. But consider that the expectation that everything that happens downstream of your work will always be attributable back to you may never have been a reasonable one in the first place.