Hacker News new | ask | show | jobs
by haberman 57 days ago
I'm old enough to remember a time when the primary hacker cause was DRM, the DMCA, patent trolls, export controls for PGP, etc. All things that made it difficult to use information when you want to. "Information wants to be free."

It's wild to see the about face. Now it's:

> If [companies] can’t source training data ethically, then I see absolutely no reason why any website operator should make it easy for them to steal it.

It would have been very difficult to predict this shift 25 years ago.

14 comments

This claim of contradiction has never worked for me.

Let say person A wants everyone to be rich.

Person B plots a plan to make themself rich and everyone else poorer.

One can make an argument that any action by A is now a contradiction. If they work with B, it makes a lot of people poorer and not richer. If they work against B, B do not get rich.

However this is not a contradiction. If a company use training data in ways that reduce and harm other peoples ability to access information, like hiding attribution or misrepresenting the data and sources, people who advocate for free information can have a consistent view and also work against such use. It is not a shift. It is only a shift if we believe that copyright will be removed, works will be given to the public for free, and companies will no longer try to hide and protect creative works and information.

You can certainly argue second-order effects (ie. we have to restrict information to save information), but the movie studios were making that same argument at the time:

> If copyright can no longer protect the distribution of the work they produce, who will invest immense sums to create films or any other creative material of the kind we now take for granted? Do the thieves really expect new music and movies to continue pouring forth if the artists and companies behind them are not paid for their work?

--Jack Valenti, Motion Picture Association of America, 2000 (https://archive.is/PBy7C)

It sounds remarkably similar to what people concerned about AI say today. How do we make sure that artists get paid?

I don't think many hackers found the argument compelling at the time.

You're taking Jack Valenti at face value. He said "we're here to protect the artists" because the artists were popular and the record labels were not. He was in the business of protecting the labels and screwing the artists and everyone knew it.
The artists were certainly making more money from the studios and record labels than they got from the authors of DeCSS, Napster, BitTorrent, The Pirate Bay, etc.

When Gillian Welch wrote "Everything is Free" in 2001, she wasn't complaining about the record companies, she was complaining about Napster.

> Q: Do you remember where you were when you wrote “Everything is Free”?

> A: I do. I remember exactly where I was and what was going on. It was when Napster was starting to decimate the traditional recording industry dynamic, the viability of making your livelihood [from] your art.

--Gillian Welch, 2018 (https://www.rollingstone.com/music/music-features/gillian-we...)

Most artists were making way more money off the fans (even those downloading music) via touring and merch sales, than they were making off of the labels from residuals. Most were not making anything from residuals.

Valenti was desperate to enlist musicians because people hated the labels and did not feel bad about stealing from them. But the vast majority of musicians were not willing to back the labels against the fans. The few he managed to enlist, like Metallica were notable because they were exceptions. And the fact that they were already rich and already at the end of there career was noted by many at the time.

In contrast you have, for instance, Courtney Love who wrote a widely-distributed essay about how she and most artists make almost nothing from record sales.

https://www.salon.com/2000/06/14/love_7/

It's an interesting essay, and the TLC case does sound pretty egregious. But the premise is undermined by the fact that Love is worth an estimated $100M today, largely thanks to owning Nirvana's publishing rights, which she inherited from Kurt Cobain.
This is what happens when a culture doesn't have robust exclusionary mechanisms for people who want to burn it down.

We welcomed the vampires in and wonder why our necks hurt.

This is like saying Winner Take All Capitalism doesn't have an exclusionary mechanism for the rich. The system exists for the sole purpose of serving the already-rich. The vampires are an inevitability baked into the system from the start.
We don't technically have "winner take all" capitalism. At least some people 90 ish years ago we had many mechanisms to regulate such situations.

Then more vampires creeped in and convinced people that the government they were voted into sucks. So began a campaign to ruin the regulations protecting them from the vampires as they slowly filled their blood banks.

Those people where trying to build a sharing/gift economy. They weren't able to keep bad actors out of their sharing economy. They are bitter that their utopian dreams got hijacked by self-dealers. Why is that wild?
It's highly debatable whether, in case of an information sharing/gift economy, the concept of "bad actors coming in and ruining it for everybody by taking without giving back" even makes sense.

The information is still there, as is the community that you've built, the joy that you get out of sharing the information, everything you've learned...

Why is any of that diminished, just because some people or entities that you dislike also got something out of it?

I would take up that debate.

Attribution is seemingly a central part of a information sharing/gift economy, and especially in a information sharing/gift community. It is part of the trust that connects people and without it the community falls apart, and with that the economy. AI by its very nature removes attribution.

Accuracy of information is a second critical aspect of information sharing and communities that are built around it. Would Wikipedia as a community and resource work if some articles was just random words? If readers don't trust the site, and editors distrust each other, the community collapses and the value of the information is reduced. It might look like adding AI generated articles would not harm other existing articles, or the joy that editors of the past had in writing them, but the harm is what happen after the community get flooded by inaccurate information. Same goes for many other information sharing communities.

Source trust and gift attribution are two distinct concepts, I'd say. One happens at the detriment to the taker (or "thief", if that even makes sense, as per my original comment); the other harms the original "producer".

For the former, it is already very much in any AI company's best interest to preserve attribution to become and remain credible.

For the latter, I can't help but wonder whether a gift economy that needs to diligently bookkeep attribution really is one, and if this is the only practicable way to implement one in a given larger society/economy, I'd say this says something important about that society as well.

I make very heavy use of sources that Gemini sites when I use it. I tend to use AI as sort of a mega search engine where I get a little bit of discussion, but if I care even a little bit about the topic, I end up reading the source material anyway.
> AI by its very nature removes attribution.

This is incorrect. RAG preserves attribution. Training data doesn't, but it doesn't make sense to attribute that anyway, unless you want a list of every person who has ever lived.

It's diminished because the hard reality is that you need money to live.

The end result of major tech companies sweeping in, taking everyone's creative work, outcompeting the originals with AI derivatives, and telling every artist on the planet "fuck off, send a job application to McDonalds" is significantly less art.

Copyright was invented to prevent exactly this scenario.

Yes, which is why hackers and artists (at least those mainly publishing instead of mainly performing for a live audience) are ultimately not natural/inherent allies.

Hackers have usually drawn their funding from their (often lucrative) employment, which is what gave them the freedom to give away the products of their hacking for free.

One needs copyright to survive, the other see it as a means to enforce openness at best (those in favor of copyleft) and as an obstacle to their pursuit (owning the full system, liberating all aspects of and information about it) at worst.

This rift was always visible if you knew where to look, but AI is definitely wedging it wide open.

> whether ... the concept of "bad actors coming in and ruining it for everybody by taking without giving back" even makes sense.

This is pretty clearly answered by the GPL: yes, it does, and this concept has been around since the very beginning.

> The information is still there

True

> as is the community that you've built

Untrue. At this point it's well understood that AI is substitutionary for many of the services that would have once afforded people a way to monetize their production for the community. Without the ability to make a living by doing so, even a small one, people will be limited to doing only what they can in the little free time they get outside of work.

That's the whole problem -- that AI, as it exists today, is taking away from the public, and hurting it at the same time. That's closer to robbery than it is to "sharing in the community".

Yes. There's a difference between walking a trail and maybe littering a a few pieces of trash, and walking a trail while actively setting branches on fire.

One scenario is manageable to leave be, or perhaps one or two volunteers clean it up. The fires have an entire trail closed down to everyone.

With some FOSS projects being bombarded by scraping traffic, redoing their PR system, considering ways to limit contributiors, and even going closed source, I don't think such a metaphor is an exaggeration.

> utopian dreams got hijacked by self-dealers

Such is the fate of all utopian dreams.

If you're implying that it's a violation of the original hacker ethos, I disagree.

"Information wants to be free" is a small part of the hacker ethos venn diagram. There are many hacker ethos traits that aren't about cracking, specifically.

Also, the server "information" isn't free (as in beer) to begin with, it costs server availability. Coming up ways to penalize greedy actors is not only well within the server operator's perogative, it's an interesting tit-for-tat problem that could pique any hacker's interests.

A bonus hacker trait is that these poisoning responses are individualistic, i.e. the government doesn't get involved, where certainly more aggressive anti-AI sentiments could (wrongly) call for that.

So I'd say this type of LLM-resistance falls squarely in the original hacker ethos, even though it incidentally counteracts one minor aspect of "information availability". Though I'd certainly agree that the picture today is a lot different than it was. Ironic even.

There's a big asymmetry of power here and "information wants to be free" was about empowering the people. Currently, corporations are bad faith actors that corrupted the idea, making it free only for themselves. Can't you see the asymmetry here? For instance, they should release the weights of the models they trained on everyone else work, but we're not seeing that except for Meta and some other groups.
"Information wants to be free, but only be used by people I wholly endorse." is the motto. You'll see young people singing the praises of piracy but then use "piracy" as an excuse for hating LLMs.
Corporations are not people.
Who works at corporations and benefits from their actions?
If my LinkedIn feed is any indication, bizarre inhuman ghouls who wear the names and profile pictures of my college friends like skin-suits and exclusively post AI-generated marketing materials for AI products.
About a few million less Americans than a few years ago, I guess.
For what it's worth, I've generally sort of been on the "information, wants to be free" side of things, and I still am. I don't really understand the folks that released their software under open source license and are now upset that LLMs are training on it -- those folks were pretty quiet when their source code was being indexed by Google. But I suppose that's because Google was sending traffic their way with they could then monetize. So this is much less about any kind of philosophical argument and much more about who's getting money, which I don't really care about. I view one of the core values of open source software as being something that we can learn from, whether that's through AI or otherwise.
> I don't really understand the folks that released their software under open source license and are now upset that LLMs are training on it

The key word there is "license." Open source often has strings attached--an obligation to credit the source, an obligation to release derivative code under the same license, etc. LLMs seldom respect the license, they just quietly and extensively plagiarize everything.

It becomes a bit easier to see when you finish the sentence. "Information wants to be free (from ______)." If you filled that blank in with "rent-seeking Capitalists and corporations," you likely have everything you need to understand why they don't see it as a turn.

I say this as someone whose notions exist orthogonal to the debate; I use AI freely but also don't have any qualms about encouraging people to upend the current paradigm and pop the bubble.

Sure, with enough effort, you can find a seemingly clever way to turn almost every mantra into its semantic opposite.
It doesn't take much cleverness because we're talking about a straightforward dynamic. A counter-cultural expression that was a "screw you" aimed at corporations was co-opted and misinterpreted by those same corporations as "It's free real estate", and now the latter are flummoxed that they're not buddies with the former. Well, points up that's why.
Hackers are not one big homogeneous group (although there definitely are larger trends, and maybe you have a point there).

Still, people were saying all kinds of inane stuff 25 years ago too.

Politics will make more sense once you realize no one is trying to have consistent principles.

People are in general for whatever they think will benefit them, and against what they think will harm them.

So piracy is ok when it benefits the little guy and not ok when it benefits the big guy. Unions are good when they stand up against employers, and bad when they discriminate against non-union workers. There's no contradiction there.

The common string between both of those advocacies is that they heavily favor huge corporations instead of the little guy.

Basically, DMCA and DRM makes you a criminal while protecting NBC and Disney and such. And AI steals your work and allows soulless mega corps to basically take your job.

Personally I'd argue AI is very likely to be worse for the average person, depending on their career.

Some people don't care or maybe don't realize. And then I think some people are just naive, and are assuming everyone else will be fucked, but they won't be. And then some other people are self-destructive, and they know it will make their life harder - but they advocate for it anyway, because they feel they deserve the suffering, and maybe hold some misguided belief that suffering is the fuel of victory.

It was never about some "information wants to be free" philosophy for most people. It was about, "I want information to be free for me to access, and btw fuck big corporations." No real shift happened.
THEN: "You can't violate our copyright because it's ours and belongs to us."

NOW: "We can violate your copyright because we want to."

YOU: "Where's mine, and how do I make more people click on these ads?"

Those people were always lying, it was always about power dynamics. People hated DRM and surveillance because they saw it as punching down. People now hate AI wielded by corpos because they see it as punching down. Extremely few (if any) people ever bought into the “cyber-utopia” thing and now the mask has completely come off, everyone knows the Internet is a tool for subjugation