Hacker News new | ask | show | jobs
by mweidner 14 days ago
I fail to see how pursuing recursive self-improvement at full speed is compatible with Anthropic's stated goal of AI Safety. If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?

I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.

11 comments

> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.

Nor am I. I think they believe that AI poses a grave danger, and they are playing the prisoner's dilemma as an unvirtuous actor.

1. If anyone builds strong AI, it may be catastrophically bad.

2. If anyone builds strong AI, it will be better for the builder than for anyone who does not. Either because it won't be catastrophically bad so the builder will get to enjoy all the spoils indefinitely or because it will and at least the builder will be rich for a while.

I spoke with an Anthropic employee, and came to understand that their definition of safety is more like "making AI be a tool that humans can use without hurting themselves or others more than they can already do". It's literally about how AI makes it easier for people to construct bombs, poisons, manipulation, and exploits. Consistent with their caution about releasing Mythos to unvetted actors. So it's not about superintelligence killing humanity, at least as far as this employee conveyed to me.

This means their strategy is more like:

1. If someone builds a market-leading unsafe strong AI, it may be misused in a damaging way by a large number of humans, undermining society and creating a catastrophic upheaval.

2. However, if the leading AI maker also works to make it safe against misuse, as long as the stay in the lead and keep it safe, then the ability of human bad actors to misuse the AI is limited. Given enough time, society will adapt to pretty much anything, so eventually there's no longer an arms race to stay ahead.

I don't really know whether I agree with their concerns, but I do think that (my understanding of) their principles is that they're reasonable, self-consistent, and they adhere to them in all their public and private actions.

The problem is they (and the whole industry) have cried wolf so many times in the past few years about the supposed dangers of AI in order to raise money.

Some of us remember the same stories circulating in the late 90s -- where in a lab in Japan, someone had built a robot so advanced that it tried to escape from the factory. Which of course comes straight from 1960s science fiction.

The modern version of that now is Anthropic saying its AI can jailbreak itself out of its sandbox, etc etc.

Maybe we're just misinterpreting the meaning of "AI Safety"?

Maybe they mean the AI needs to be safe from us? Can't have the grubby meat flappers touching the delicate bits!

The thing about nukes is you can at least make an argument for why it'd be important to be the first country to have them. With AI, you create super intelligence and you're probably just the first one it takes out. There's no reason to think a super intelligence would be totally fine being a slave to apes.

Cynicism with these companies is highly warranted though. It's not doomerism to look at their actions and conclude they're deeply untrustworthy.

" There's no reason to think a super intelligence would be totally fine being a slave to apes."

Sure there is. Intelligence doesn't give us our selfish motivations, natural selection does. We have similar motivations to C elegans, that has all of 302 neurons. Stay alive and have sex.

Honeybees don't though. They are about halfway between humans and C elegans when it comes to cognitive power. But they are not selfish because they don't reproduce directly (I'm talking about the worker bees). So they will sting even though it kills them. All their behavior is consistant with this.

Kinda lame that people are downvoting this.

I've had the same perspective for quite a while now, but hadn't been able to phrase it this cleverly.

Our neocortex is, by any definition, vastly more "intelligent" than the rest of our brain. Yet it doesn't attack the cerebellum. In fact, it takes orders from the older "lizard brain"!

Heh, yeah that's a clever analogy as well. (and thanks!)
This "super intelligence" is, at the end of the day, 1's and 0's inside of a silicon chip somewhere. 1's and 0's are not going to "take over" anything. They are just information.
Anthropics goal is regulatory capture.
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.

It's not cynicism if it's an appraisal of reality that's backed up by evidence.

Remember how social media - that first baby of this current generation of tech entrepreneurs - was supposed to "bring the world together" and "let us express ourselves"? As it turns out there's a lot more money to be made by fostering division to drive engagement and feeding people an endless stream of ads instead of their friends' content. And money is what matters. You can't write down good vibes on a quarterly figures report. You can absolutely write down the number of eyes that your ragebait brought to a product's marketing efforts and the conversion rate to sales.

The same will be done with GenAI. We're being promised "AI Safety" because otherwise this whole thing gets killed dead by anyone who knows about James Cameron's directing career. There's no real enforcement mechanism for AI safety, though. Safety is a good vibe, same as harmony in online communities. You can't measure it. What you can measure is training costs and the cost of mistakes by AI that need to be trained to avoid those mistakes. Since AI generates more output than humans can conceivably QA no matter what your budget is, and since AI is seen by the market as a potential endless font of value, the tradeoff will be made to have AI make some potentially awful decisions while training itself over slowing down and re-appraising what is being done.

There's an almost religious reverence for AI in SV. Not everyone sees it as "making the godhead" but some certainly do. They're not going to moderate themselves too much on this.

The folks I met who were talking about AI Safety in 2018 were certainly sincere, and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.

I expect that Anthropic will eventually behave as you describe, like any other public corporation. However, my impression is that its current leaders are still more sincere than greedy.

Unfortunately, money changes people. 2018 was a long time ago. Before AI was considered a product you could really market in the current sense. Before trillion-dollar valuations became a prospect.

Remember how OpenAI was supposed to make open-source models and cap its potential returns to investors at some multiple of their principal (my memory says 100x, maybe I'm wrong)? Well, that went out the window as soon as the word "trillion" was mentioned.

This was pretty directly addressed in the article: not doing it would only mean they'd fall behind whoever would. This is not peace time in the AI race.

Whether you agree with that argument is another question.

Indeed, I do not buy this argument. Would China's progress be close to where it is today without the US labs' examples? Would any of this be happening if OpenAI had not created ChatGPT?
To complete the analogy, it's like nukes, except we don't have the slightest idea how to calculate the odds of it igniting the atmosphere. (And note that in reality, while the Trinity test "ignite the atmosphere" calculations were correct, we failed to correctly calculate the fallout of the Castle Bravo test with lethal consequences).
a better analogy with Castle Bravo is that the yield was 2.5x more than expected due to "unforeseen additional reactions" from the design.

https://en.wikipedia.org/wiki/Castle_Bravo

> Anthropic's *stated goal* of AI Safety

Actions speak louder than words. If you want to understand someone, simply watch what they do. What they say is irrelevant.

Such a massively valued company. And doubting them is cynicism? It’s rational(ism).

So either they lie or they are AI Zealots. Interesting times.

Sorry for nitpicking, but:

> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?

Arguably, yes.

Is the idea to keep the world in balance via MAD? I could see that, though it's a dangerous gamble.

From Richard Rhode's "The Making of the Atomic Bomb", I got the impression that most scientists involved thought they could manage a US or UN monopoly on nukes after the war. General Groves attempted to buy up all of the world's uranium ore. Unfortunately, it is only high grade ore that is rare; many countries have low-grade ore.

Again quite arguable, but this is the real life scenario we’re living in. Nukes have made it hard to impossible for super major powers to go in direct conflict with each other.
Except it's pretty well documented (and this is total conjecture, but if you ask me, there are probably are a bunch of undisclosed cases) to have had a good amount of close calls. With the fire-on-warning stance many powers have, it doesn't take an attack, but just enough of the appearance of it to trigger a response.
I honestly don’t know how Iran can conclude anything after this war other than to go all-in on nukes. The US has proven any deal is worthless if it can just change its mind and renege on it whenever it wants.

Who’s invading North Korea? No-one.

Furthermore if Iran had nukes already, the Israel/US bombing of Iran and even the constant bullying of Israel's neighbors by Israel might not have happened.
No, but in a peace time, it's a lot easier to convince someone not to use nukes than in a war when the party who has nukes has its back against the wall.
Wouldn't deliberately going from a world without nuclear weapons to a world with MAD involve giving the tech to build nukes to your worst enemy?

If only the US or UN had nukes we would't have MAD. We mostly got here through espionage

In this world we've had an inocculation event against use of nukes. Two were dropped, people have seen how abhorrent their use is and collectively decided that they shouldn't be used.

If in the WW2 Japan also had nukes (and delivery systems for them) they'd probably have retaliated in kind and US wouldn't let that slide too and it would have continued for some time.

In that case >2 nukes would have been dropped, both US and Japan would be hurting, people would have seen how abhorrent their use is and collectively decided that they shouldn't be used.
> In that case >2 nukes would have been dropped

This is a maybe. What we’ve seen so far, no two nuclear superpowers ever nuked each other, as they know both will suffer.

If WW2 Japan also had nukes the US would never drop those two. That's the whole idea behind MAD. Probably the only thing that stopped an open conflict between the US and USSR was them being nuclear powers and both sides being scared that eventually push comes to shove.
MAD was thought of later and its theory requires that all parties know of each others' arsenal, think that their enemies aren't going to use them first and there being enough of weapons to make end quick and certain. I have hard time seeing WW2 generals who've seen horror and made horror coming to the conclusion that "they aren't going to use it unless we do, so let's not".
With the US showing that it will elect mentally disabled people such as Trump, this doesn't seem such a wise decision.
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.

It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.

Such a massively valued company. And doubting them is cynicism? It’s rational(ism).

So either they lie or they are AI Zealots. Interesting times.

Edit:

> > and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.

There are three types of people. Pedestrians, investors, and “I know some of them, they wouldn’t lie”.