Hacker News new | ask | show | jobs
by beeboobaa3 772 days ago
How to deal with "unlearning" is the problem of the org running the illegal models. If I have submitted a gdpr deletion request you better honor it. If it turns out you stole copyrighted content you should get punished for that. No one cares how much it might cost you to retrain your models. You put yourself in that situation to begin with.
2 comments

Exactly, I think is where it leads to eventually. And that is what I my original comment meant as well. "Delete it" rather than using some more techniques to "unlearn it", unless you claim the unlearning is 100% accurate.
> No one cares how much it might cost you to retrain your models.

Playing tough? But it's misguided. "No one cares how much it might cost you to fix the damn internet"

If you wanted to retro-fix facts, even if that could be achieved on a trained model, it would still get back by way of RAG or web search. But we don't ask pure LLMs for facts and news unless we are stupid.

If someone wanted to pirate a content it would be easier to use Google search or torrents than generative AI. It would be faster, cheaper and higher quality. AIs move slow, are expensive, rate limited and lossy. AI providers have in-built checks to prevent copyright infringement.

If someone wanted to build something dangerous, it would be easier to hire a specialist than to chatGPT their way into it. All LLMs know is also on Google Search. Achieve security by cleaning the internet first.

The answer to all AI data issues - PII, Copyright, Dangerous Information - is coming back to the issue of Google search offering links to it, and websites hosting this information online. You can't fix AI without fixing the internet.

What do you mean playing tough? These are existing laws that should be enforced. The amount of people's lives ruined by the American government because they were deemed copyright infringers is insane. The us has made it clear that copyright infringement is unacceptable.

We now have a new class of criminals infringing on copyright on a grand scale via their models and they seem desperate to avoid persecution hence all this bullshit.

1. You are assuming just training a model on copyrighted material is a violation. It is not. It may be under certain conditions but not by default.

2. Why should we aim for harsh punitive punishments just because it was done so in the past?

> 1. You are assuming just training a model on copyrighted material is a violation. It is not. It may be under certain conditions but not by default.

Using copyrighted content for commercial purposes should be a violation if it's not already considered to be one. No different from playing copyrighted songs in your restaurant without paying a licensing fee.

> 2. Why should we aim for harsh punitive punishments just because it was done so in the past?

I'd be fine with abolishing, or overhauling, the copyright system. This rules with harsh penalties for consumers/small companies but not for bigtech double standard is bullshit, though.

> Using copyrighted content for commercial purposes should be a violation

so reading a book and using the book contents to help you in your job would be a violation too based on your logic

A business cannot read a book, and your machine learning model is not given human rights.