Hacker News new | ask | show | jobs
by bee_rider 641 days ago
> Read the terms of use / contract for Meta AI products. If you deploy it, some producer finds the model spits out copyrighted content, knocks on Meta's door, Meta will point to you for the rest of the court case. If that's the future for AI, then it doesn't really matter whether China wins.

As much as I hate Facebook, I think that seems pretty… reasonable? These AI tools are just tools. If somebody uses a crayon to violate copyright, the crayon is not to blame, and certainly the crayon company is not, the person using it is.

The fact that Facebook won’t voluntarily take liability for any thing their users’ users’ might do with their software means that software might not be useable in some cases. It is a reason to avoid that software if you have one of those use cases.

But I think if you find some company that says “yes, we’ll be responsible for anything your users do with with our product,” I mean… that seems like a hard promise to take seriously, right?

2 comments

This is a bad analogy. The factory producing crayons doesn’t need to ingest hundreds of millions of copyrighted works as a fundamental part of its process to make crayons.
I don’t think it is a bad analogy, it is just separating out the issues.

If the thing required breaking the law to make, it just shouldn’t have been made. But, in that case, Facebook should not accept liability for how their users use the thing. They should just not share it at all, and delete it.

Crayons aren’t made by mashing people’s artwork through a gpu.

Crayons don’t generate content either.

If I download something from megaupload (rip) megaupload is the one that gets in trouble. They are storing, compressing, and shipping that information to me.

The same thing happens with AI, the information is just encoded in the model weights instead of a video or text encoding or whatever. When you download a model, you’re downloading a lossy compressed version of all the data it was trained on.

This seems more like an argument that the model just shouldn’t have been created, or that it shouldn’t be used. If a model is just an lossy compressed version of a bunch of infringing content, why would Facebook (or OpenAI, or anybody else hosting a model and providing an API to it) be in the clear?
To be fair, maybe yes, these models shouldn’t have been created. Well they have been created so now we need a new novel way to make sure they don’t damage other people’s work. Something like this did not exist before, and therefore needs a new set of rules that the model creators, with all their might and power, are trying to strongly lobby against.
Tech likes to follow the “ask for forgiveness, not for permission “ motto.

If OpenAI, Facebook, or whoever asked for permission to gobble up all publicly visible data to train a program to output statistically similar data, I don’t believe they would’ve got the permission.

In that sense, I don’t think these models should’ve been made.

I dont think any of those companies would be in the clear. That’s my point.

AI is a copyright black hole, albeit a useful one.

Let's say a factory builds a mega puzzle from many images shredded to identically-shaped puzzle pieces so you can piece them together as you want or need. Some pieces from some images are omitted due to their closeness in image space or due to them being infrequent enough.

This 70B pieces puzzle is an LLM.

You can reproduces likings of any hero from Marvel universe close enough using this puzzle. Or you can create

Who is to blame?

AI safety is expensive, or even impossible, by releasing your models for local inference (not behind API). Meta AI shifts the responsibility of highly-general highly-capable AI models to smaller developers, putting ethics, safety, legal, and guard-rails responsibility on innovators who want to innovate with AI (without having the knowledge or resources to do so by themselves) as an "open-source" hacking project.

While Mark claims his Open Source AI is safer, because fully transparent and many eyes make all bugs shallow, the latest technical report makes mention of an internal, secret, benchmark that had to be developed, because available benchmarks did not suffice at that level of capabilities. For child abuse generation, it only makes mention that it investigated this, not any results of these tests or conditions under which it possibly failed. They shove all this liability on the developer, while claiming any positive goodwill generated.

It completely loses their motivation to care for AI safety and ethics if fines don't punish them, but those who used the library to build.

Reasonable for Meta? Yes. Reasonable for us to nod along when they misuse open source to accomplish this? No.

I think this could be a somewhat reasonable argument for the position that open AI just shouldn’t exist (there are counter arguments, but I’m not interested enough to do a back and forth on that). If Facebook can’t produce something safe, maybe they shouldn’t release anything at all.

But, I think in that case the failing is not in not taking the liability for what other people do with their tool. It is in producing the tool in the first place.

Perhaps Open AI simply can't exist (too hard and expensive to coordinate/crowd-source compute and hardware). If it can, then, to me, it should and would.

OpenAI produced GPT-2, but did not release it, as it couldn't be made safe under those conditions, when not monitored or patch-able. So it put it behind an API and owned its responsibility.

I didn't take issue with Meta's business methods and can respect its cunning moves. I take issue with things like them arguing "Open Source AI improves safety", so we can't focus on the legit cost-benefits of releasing advanced, ever-so-slightly risky, AI into the hands of novices and bad actors. It would be a failure on my part if I let myself get rigamaroled.

One should ideally own that hypothetical 3% failure rate to deny CSAM request when arguing for releasing your model still. Heck, ignore it for all I care, but they damn well do know how much this goes up when the model is jailbroken. But claiming instead that your open model release will make the world a better place for children's safety, so there is not even a need to have this difficult discussion?

This strange obsession with synthetic CSAM as the absolute epitome of "AI safety" says more about the collective phobias and sensibilities of our society than about any objective "safety" issues.

Of course, from a PR perspective, it would be extremely "unsafe" for a publicly traded company to release a tool that can spew out pedophile literature, to the point of being an existential threat. Twitter was economically cancelled for much less. But as far as dangerous AI goes, it's one of the most benign and inconsequential failure modes.