Hacker News new | ask | show | jobs
by nslsm 77 days ago
You are comparing Anthropic obtaining public data from the Internet to Anthropic leaking their trade secrets and having them distributed by third parties.
11 comments

>obtaining public data from the Internet

Like slurping my open source projects, while completely disregarding their licenses. In my case, I'm particularly annoyed by the violation of the spirit of *GPL licenses. So they're no strangers to abusing licensed code (in technically probably legal, but untested in court, ways).

There’s this thing about trade secrets that like all secrets, they stop being secret the instant they’re leaked. You can’t DMCA third parties for distributing your trade secrets. The only one you can sue is the party that was contractually bound not to leak them and then did anyway. Now, copyright is a different thing.
Most of that "public data from the Internet" is subject to licenses, yet their entire business model is built on top of a legally grey algorithm that ingests that licensed code and spits it back out without the license. They have no legal right to any of that code, they're just getting away with it because laws are for the poor.

If you believe any data that is publicly accessible is fair game regardless of licenses, then by that definition, Claude Code's source code is included.

Books3 is public data on the internet in the same way that the Claude code source code is public data on the internet.

Except Anthropic published the Claude code source code themselves, while Books3 was not published by their original authors.

Anthropic published them in a public S3 bucket. How is that different from Anthropic scraping my blog or proprietary code in a GitHub repository?
Doesn't Anthropic claim that Claude Code is 100% written by Claude, which would obviously mean that it is not copyrightable code and therefore the DMCA does not apply and logically that these DMCA claims are invalid?
C’mon bro. It isn’t like all AI companies haven’t pirated all research papers, books, magazines, and pay walled content on the Internet.

Either you are being naive AF or you are actively trying to spread discontent. I hope it is the former.

Try asking Gemini information from workshop manuals that are not publicly available. It will pretty much tell you everything you want to know, but it will refuse to tell where it got the information.
I mean, Anthropic's code was "public data from the internet" as well. They published it publicly. Accidentally, but they made it public. Fair game, right?
Information still wants to be free
Not just public data.
Remember Google’s book scanning project?