Hacker News new | ask | show | jobs
by drooby 104 days ago
No offense, but this comment makes virtually zero contact with reality.

Our entire civilization runs on your "polite lie" of owning non-physical things. Patents, copyrights, trade secrets, licensing agreements, NDAs. Trillion dollar companies are built on the legal enforceability of intellectual property. The software you're using to type this comment exists because someone owns the code.

Calling information "entropy" doesn't make contract law disappear. We decided collectively that people and institutions can own ideas, and we built the modern economy on that decision. You can argue that's a fiction, but it's a fiction that everything around you depends on.

You can't invoke "universal laws of information" to dismiss public claims to training data while the companies training on it aggressively enforce their own IP. They patent their architectures. They copyright their outputs. They sue competitors for misuse. They clearly believe in ownership of non-physical objects when it benefits them.

You don't get it both ways.

2 comments

I agree with about 85% of what you said, but this:

> We decided collectively

"We" didn't decide that copyright would be 75 years past the death of the authors heirs. Powerful corporations that have lobbyist representation "collectively decided" that on our behalf. In 2011 they were trying to put all this copyright law under the Trans Pacific Partnership making it an international issue and expressly taking away the rights of the people to change it. For most citizens the original term of 7 years was enough before it became public domain.

If citizens had real representation and not FAANG capital and lobbying, they could easily vote to tax AI, and most of them would.

(not intending ot be snarky, but this isn't my area of knowledge in the least.) Didn't the AI organizations 'get it both ways' when they trained on vast collection of works under copyright and then purely "own' the outcome?
That's not snarky at all, that's exactly the point. They did get it both ways.

The comment I was responding to argued that ownership of non-physical things is basically a "polite lie" and that information is just entropy that belongs to whoever can capture it. My point was that the AI companies clearly don't believe that when it applies to them. They patent their architectures, copyright their outputs, sue competitors for IP violations, and lock down their model weights. They fully believe in ownership of non-physical things.

But when it comes to the billions of people whose work they trained on? Suddenly information is free-flowing entropy that belongs to no one.

That's the asymmetry at the heart of this. The rules around IP apparently apply when it protects their profits, but not when it would obligate them to share those profits with the people whose work made them possible. Which is exactly why the public needs to assert a claim now, before that asymmetry gets any more entrenched.

Addition:

Also worth knowing: collective intellectual property already exists. ASCAP and BMI have been doing exactly this for decades. Individual songwriters can't enforce their rights every time their music gets played, so they pool their IP, license it collectively, and distribute the revenue. The problem they solved is almost identical to the training data problem. Each individual contribution is tiny, but the collective value is enormous. Applying this at the scale of the general public would be novel, but the underlying mechanism isn't. The concept works. It just hasn't been applied to training data yet.

Interesting analogy.
I mean, the AI companies want it this way, but the same laws of information apply to them too. They can patent whatever they want, but as we see other nations use their models to distill information to other models with almost nothing they can do about it.

Patents, copyright, lawsuits are all post ad hoc actions which mean the milk has already been stolen. And it only works if the rule of law is something that is respected, that's not going so well lately.

We are seeing this in that there is little to no moat between the models, nearly everyone with the needed compute seems to catch up pretty quickly. And when said rivalries cross national boarders the only solution to these problems quickly becomes violence.

With how information works AI wins this game in the long run. Individual humans scale poorly and their ability to individually acquire information is a slow process. Looking at this on a company by company basis is not the proper way to show how the future with models is going to play out.

This is interesting. As a naive user I’ve gotten the gut feeling of commoditization among the models. I assumed the data center capacity push is intended to be the differentiator but that still seems utility-like over time. (and the data centers in space concept seems like good PR and IR, but to me, technically… ambitious)