Hacker News new | ask | show | jobs
by rkou 641 days ago
And what about the future of social media?

This is such devious, but increasingly obvious, narrative crafting by a commercial entity that has proven itself adversarial to an open and decentralized internet / ideas and knowledge economy.

The argument goes as follows:

- The future of AI is open source and decentralized

- We want to win the future of AI instead, become a central leader and player in the collective open-source community (a corporate entity with personhood for which Mark is the human mask/spokesperson)

- So let's call our open-weight models open-source, and benefit from its imago, require all Llama developers to transfer any goodwill to us, and decentralize responsibility and liability, for when our 20 million dollar plus "AI jet engine" Waifu emulator causes harm.

Read the terms of use / contract for Meta AI products. If you deploy it, some producer finds the model spits out copyrighted content, knocks on Meta's door, Meta will point to you for the rest of the court case. If that's the future for AI, then it doesn't really matter whether China wins.

7 comments

> Read the terms of use / contract for Meta AI products. If you deploy it, some producer finds the model spits out copyrighted content, knocks on Meta's door, Meta will point to you for the rest of the court case. If that's the future for AI, then it doesn't really matter whether China wins.

As much as I hate Facebook, I think that seems pretty… reasonable? These AI tools are just tools. If somebody uses a crayon to violate copyright, the crayon is not to blame, and certainly the crayon company is not, the person using it is.

The fact that Facebook won’t voluntarily take liability for any thing their users’ users’ might do with their software means that software might not be useable in some cases. It is a reason to avoid that software if you have one of those use cases.

But I think if you find some company that says “yes, we’ll be responsible for anything your users do with with our product,” I mean… that seems like a hard promise to take seriously, right?

This is a bad analogy. The factory producing crayons doesn’t need to ingest hundreds of millions of copyrighted works as a fundamental part of its process to make crayons.
I don’t think it is a bad analogy, it is just separating out the issues.

If the thing required breaking the law to make, it just shouldn’t have been made. But, in that case, Facebook should not accept liability for how their users use the thing. They should just not share it at all, and delete it.

Crayons aren’t made by mashing people’s artwork through a gpu.

Crayons don’t generate content either.

If I download something from megaupload (rip) megaupload is the one that gets in trouble. They are storing, compressing, and shipping that information to me.

The same thing happens with AI, the information is just encoded in the model weights instead of a video or text encoding or whatever. When you download a model, you’re downloading a lossy compressed version of all the data it was trained on.

This seems more like an argument that the model just shouldn’t have been created, or that it shouldn’t be used. If a model is just an lossy compressed version of a bunch of infringing content, why would Facebook (or OpenAI, or anybody else hosting a model and providing an API to it) be in the clear?
To be fair, maybe yes, these models shouldn’t have been created. Well they have been created so now we need a new novel way to make sure they don’t damage other people’s work. Something like this did not exist before, and therefore needs a new set of rules that the model creators, with all their might and power, are trying to strongly lobby against.
Tech likes to follow the “ask for forgiveness, not for permission “ motto.

If OpenAI, Facebook, or whoever asked for permission to gobble up all publicly visible data to train a program to output statistically similar data, I don’t believe they would’ve got the permission.

In that sense, I don’t think these models should’ve been made.

I dont think any of those companies would be in the clear. That’s my point.

AI is a copyright black hole, albeit a useful one.

Let's say a factory builds a mega puzzle from many images shredded to identically-shaped puzzle pieces so you can piece them together as you want or need. Some pieces from some images are omitted due to their closeness in image space or due to them being infrequent enough.

This 70B pieces puzzle is an LLM.

You can reproduces likings of any hero from Marvel universe close enough using this puzzle. Or you can create

Who is to blame?

AI safety is expensive, or even impossible, by releasing your models for local inference (not behind API). Meta AI shifts the responsibility of highly-general highly-capable AI models to smaller developers, putting ethics, safety, legal, and guard-rails responsibility on innovators who want to innovate with AI (without having the knowledge or resources to do so by themselves) as an "open-source" hacking project.

While Mark claims his Open Source AI is safer, because fully transparent and many eyes make all bugs shallow, the latest technical report makes mention of an internal, secret, benchmark that had to be developed, because available benchmarks did not suffice at that level of capabilities. For child abuse generation, it only makes mention that it investigated this, not any results of these tests or conditions under which it possibly failed. They shove all this liability on the developer, while claiming any positive goodwill generated.

It completely loses their motivation to care for AI safety and ethics if fines don't punish them, but those who used the library to build.

Reasonable for Meta? Yes. Reasonable for us to nod along when they misuse open source to accomplish this? No.

I think this could be a somewhat reasonable argument for the position that open AI just shouldn’t exist (there are counter arguments, but I’m not interested enough to do a back and forth on that). If Facebook can’t produce something safe, maybe they shouldn’t release anything at all.

But, I think in that case the failing is not in not taking the liability for what other people do with their tool. It is in producing the tool in the first place.

Perhaps Open AI simply can't exist (too hard and expensive to coordinate/crowd-source compute and hardware). If it can, then, to me, it should and would.

OpenAI produced GPT-2, but did not release it, as it couldn't be made safe under those conditions, when not monitored or patch-able. So it put it behind an API and owned its responsibility.

I didn't take issue with Meta's business methods and can respect its cunning moves. I take issue with things like them arguing "Open Source AI improves safety", so we can't focus on the legit cost-benefits of releasing advanced, ever-so-slightly risky, AI into the hands of novices and bad actors. It would be a failure on my part if I let myself get rigamaroled.

One should ideally own that hypothetical 3% failure rate to deny CSAM request when arguing for releasing your model still. Heck, ignore it for all I care, but they damn well do know how much this goes up when the model is jailbroken. But claiming instead that your open model release will make the world a better place for children's safety, so there is not even a need to have this difficult discussion?

This strange obsession with synthetic CSAM as the absolute epitome of "AI safety" says more about the collective phobias and sensibilities of our society than about any objective "safety" issues.

Of course, from a PR perspective, it would be extremely "unsafe" for a publicly traded company to release a tool that can spew out pedophile literature, to the point of being an existential threat. Twitter was economically cancelled for much less. But as far as dangerous AI goes, it's one of the most benign and inconsequential failure modes.

Doesn’t Threads and Fediverse indicate that they are headed that way for social as well?
The last time we had a corporate romance between an open source protocol/project, "XMPP + Gtalk/Facebook = <3", XMPP was crappy and it was moving too slowly to the mobile age. Gtalk/Messenger gave up on XMPP and evolved their own protocols and stopped federating with the "legacy" one.

I think the success of the "Threads + Fediverse = <3" relies on the Fediverse not throwing the towel and leaving Threads as the biggest player in the space. That would mean fixing a lot of problems that that people have with Activity Pub today.

I don't want to say the big tech are awesome and without fault, but at the end of the day big-techs will be big-techs. Let's keep the Fediverse relevant and Meta will continue to support it, otherwise it will be swallowed by the bigger fish.

For some reason, this has made me wonder if we just need more non-classical-social-media fediverse stuff. Like of course people will glom on to Threads, it means they can interact with the network while still being inside Facebook’s walled garden…

I wonder if video game engines could use it as an alternative to Steam or Discord integration.

The problem was not that it was not evolving. The problem was that they decided they had trapped all the users of other networks they could trap.

Slack did the same killing xmpp and irc bridge. I don't see them making a matrix bridge.

Last I checked, there was a movement in the biggest instances to defederate from meta's embrace stage of "embrace extend extinguish" playbook. I didn't check back to see if it got pushed through.

Given the nature of the fediverse, if it happened or not depends on the instance you use/follow.

It's got nothing to do with Meta's social media business directly. Massive as the FB dataset is, it gets mogged by google who, what with their advanced non-PHP-based infra and superior coders, basically have way more and way better and way more accessible data... and their own AI CPUs they made, and a bigger cluster, and faster software, and more store, and so on. Big picture, Google is poised to steamroll Facebook AI-wise, and if no them, then openAI+microsoft

So Meta says "well we will buy tons of compute and try to make it distributed" "we'll make the model open and people will fine-tune with data that they found" and so on. Now google and openAI aren't competing versus meta, they are competing versus meta + all compute owned by amateurs + all data scrapped by all amateurs, which is non-trivial. so it's not so much as aspiring to be #1 as capping the knees of the competition who has superior competitiveness - but people love it because the common man wins here for once.

Anyway, eventually, they'll all be open models. Near future weaker models will run on a PC, bigger models on the cluster, weakest models on the phone... then just weak models on the phone and bigger on the PC.. eventually anything and everything fits on a phone and maybe iWatch. Even Google and openAI will have to run on the PC/phone at this point, it wouldn't make sense not to. Then since people have local access to these devices, it all gets reverse engineered, boom boom boom. now they're all open

If it was really open-source you'd be able to just train one yourself.
This sort of puts the whole notion of "open source" at risk.

Code is a single input and is cheap to compile, modify, and distribute. It's cheap to run.

Models are many things: data sets, data set processing code, training code, inference code, weights, etc. But it doesn't even matter if all of these inputs are "open source". Models take millions of dollars to train, and the inference costs aren't cheap either.

edit:

Remember when platforms ate the open web? We might be looking at a time where giants eat small software due to the cost and scale barriers.

> We might be looking at a time where giants eat small software due to the cost and scale barriers.

This assumes that abstractions are no longer possible.

Only if you were a billionaire. These models are starting to be so out of reach for single researchers or even traditional academic research groups.
Maybe the road to heaven is paved with bad intentions.
It's especially rich coming from Facebook who was all for regulating everyone else in social media after they had already captured the market.

Everyone tries this. Apple tried it with lawsuits and patents, Facebook did it under the guise of privacy, OpenAI will do it under the guise of public safety.

There's almost no case where a private company is going to be able to successfully argue "they shouldn't be allowed but we should" I wonder why so many companies these days try. Just hire better people and win outright.

It has been clear from the beginning that Meta's supposed desire for an open source AI, is just a coping mechanism for the fact that got beat out of the gate. This is an attempt to commoditize AI and reduce OpenAI/Google/Whoever's advantage. It is effective, not doubt, but all this wankery about how noble they are for creating an open-source AI future is just bullshit.
You're wrong here. Meta has released state of the art open source ML models prior to ChatGPT. I know a few successful startups (now valued at >$1b) that were built on top of Detectron2, a best-in-class image segmentation model.
It’s because Facebooks complementary good is content (primary good is ad slots) and if somebody wins the ai race they can pump out enough content to jumpstart a Facebook competitor with a ton of content.
I feel the same way. I'm grateful to Meta for releasing libre models, but I also understand that this is simply because they're second in the AI race. The winner always plays dirty, the underdog always plays nice.
but they've _always_ released their stuff. Thats part of the reason why the industry uses pytorch, that and because its better than tensorflow.

In the same way that detectron and Segment anything is an industry standard.

Sure, for LLMs openAI released a product first. but its not unusual for meta to release useful models.