Hacker News new | ask | show | jobs
by visarga 773 days ago
> No one cares how much it might cost you to retrain your models.

Playing tough? But it's misguided. "No one cares how much it might cost you to fix the damn internet"

If you wanted to retro-fix facts, even if that could be achieved on a trained model, it would still get back by way of RAG or web search. But we don't ask pure LLMs for facts and news unless we are stupid.

If someone wanted to pirate a content it would be easier to use Google search or torrents than generative AI. It would be faster, cheaper and higher quality. AIs move slow, are expensive, rate limited and lossy. AI providers have in-built checks to prevent copyright infringement.

If someone wanted to build something dangerous, it would be easier to hire a specialist than to chatGPT their way into it. All LLMs know is also on Google Search. Achieve security by cleaning the internet first.

The answer to all AI data issues - PII, Copyright, Dangerous Information - is coming back to the issue of Google search offering links to it, and websites hosting this information online. You can't fix AI without fixing the internet.

1 comments

What do you mean playing tough? These are existing laws that should be enforced. The amount of people's lives ruined by the American government because they were deemed copyright infringers is insane. The us has made it clear that copyright infringement is unacceptable.

We now have a new class of criminals infringing on copyright on a grand scale via their models and they seem desperate to avoid persecution hence all this bullshit.

1. You are assuming just training a model on copyrighted material is a violation. It is not. It may be under certain conditions but not by default.

2. Why should we aim for harsh punitive punishments just because it was done so in the past?

> 1. You are assuming just training a model on copyrighted material is a violation. It is not. It may be under certain conditions but not by default.

Using copyrighted content for commercial purposes should be a violation if it's not already considered to be one. No different from playing copyrighted songs in your restaurant without paying a licensing fee.

> 2. Why should we aim for harsh punitive punishments just because it was done so in the past?

I'd be fine with abolishing, or overhauling, the copyright system. This rules with harsh penalties for consumers/small companies but not for bigtech double standard is bullshit, though.

> Using copyrighted content for commercial purposes should be a violation

so reading a book and using the book contents to help you in your job would be a violation too based on your logic

A business cannot read a book, and your machine learning model is not given human rights.
> A business cannot read a book

Assume the human read the book as part of their job. Is that using copyrighted material for commercial purposes?

If that doesn't count then I'm not sure why you brought up "commercial purposes" at all.

> This rules with harsh penalties for consumers/small companies but not for bigtech double standard is bullshit, though.

Consumers and small companies get away with small copyright violations all the time. And still bigger than having your image be one of millions in a training set.

A business is... made of people.