Hacker News new | ask | show | jobs
by Isamu 312 days ago
This has been happening for a long time. I first noticed this with the hand waving dismissals of older concepts like Asimov’s laws.

Not a carefully reasoned argument why “not causing harm to a human” is outmoded, but just pushing it aside. I would love to see a good reasoned argument there.

No, instead there is Avoiding talking about harm to humans. Just because harm is broad doesn’t get you out of having to talk about it and deal with risks, which is at the root of engineering.

5 comments

I think a big factor in Asimov's laws specifically being sidelined is that the whole process of building AI looks very different from what we pictured back then.

Instead of us programming the AIs by feeding it lots of explicit hand-crafted rules/instructions, we're feeding the things with plain data instead, and the resulting behavior is much more black-box, less predictable and less controllable than anticipated.

Training LLMs is closer, conceptually, to raising children than to implementing regexp parsers, and the whole "small simple set of universal constraints" is just not really applicable/useful.

Isn't this worse, though? You said it yourself, it's an even blacker black box that nobody truly understands. And we even have an attempt at setting rules for these things akin to Asimov's laws, those "system prompts" that people are fascinated by are testament to that and they tend to be THOUSANDS of statements long (and they're not very good at following them).

People often misunderstand Asimov's laws, the entire point of the laws and the stories they're set in was that you can't just throw a simple "Don't hurt people" clause at a black box like AI and expect good results. You first have to define "Don't", then you have to define "hurt" and perhaps the hardest of all is you have to define "people". And I mean really define it, to the smallest most minute detail of what exactly all those words mean. Otherwise you very quickly run into funny, tragic and even contradictory situations, and those situations are endlessly unique.

Is feeding grossly unhealthy food to a starving person harm? Perhaps not, you can argue it's better to eat something unhealthy than to starve. What about feeding someone on the brink of a cardiac arrest that same meal? Now what about all the other gray areas involved here, you have to define every single possible situation in which an unhealthy meal might affect someone.

It's kinda funny, because it really is almost prophetic considering it's a story written quite a long time before we were even close to it being a reality...

My point is that past expectations about AI where that it would be possible/most viable to set hard, explicit constraints on behavior, like how a stream of instructions constrains the behavior of a CPU.

There simply IS no explicit definition for "people", "hurt" or "don't" inside an LLM that you could found such hard constraints on.

Note that we never found a way to "program" such constraints into a human mind either, we probably/hopefully never will, and I think that whole approach ("simple, hard deterministic constraints") is just never gonna work for AI; so Asimovs rule framework is just not really applicable.

Raising children involves a whole lot of simple constraints that you gradually relax.

“Don’t touch the knife” becomes “You can use _this_ knife, if an adult is watching,” which becomes “You can use these knives but you have to be careful, tell me what that means” and then “you have free run of the knife drawer, the bandages are over there.” But there’s careful supervision at each step and you want to see that they’re ready before moving up. I haven’t seen any evidence of that at all in LLM training—it seems to be more akin to handing each toddler every book ever written about knives and a blade and waiting to see what happens.

Asimov didn't describe robots programmed using formal logic. All robots in Asimov's stories had "positronic brains" that were described as being quite humanlike and unpredictable. His stories all revolve around this: the 3 laws are intentionally vague and open to interpretation, allowing non-deterministic or surprising outcomes. Not so different to LLMs.
>Training LLMs is closer, conceptually, to raising children than to implementing regexp parsers, and the whole "small simple set of universal constraints" is just not really applicable/useful.

That this can be said, and there still being so doubt we should ramp up the Ethics research before going and rawdogging the implementation just bloody bewilders me.

> Instead of us programming the AIs by feeding it lots of explicit hand-crafted rules/instructions, we're feeding the things with plain data instead, and the resulting behavior is much more black-box, less predictable and less controllable than anticipated.

I dunno, we do feed them lots of explicit hand-crafted rules/instructions, it's just that does don't go into the training process, but instead goes into the "system"/"developer" prompts, which is effectively the way you "program" the LLMs.

So you start out with nothing, adjust the weights based on the datasets until you reach something that allows you to "program" them via the system/developer prompts, which considering what's happening behind the scenes, is more controllable than expected.

Yes, but those hand-crafted rules are just input data, they don't actually constrain the behavior, they are just an attempt.

Similarly to how verbal instruction works with a child: You can tell it not to touch the hot stove, but the child still might try.

> they don't actually constrain the behavior

They do actually constraint the behavior, to various degrees of success which depends on the model, the system prompt, the inference parameters, the current context length and a lot more. Add in the new `developer` role and you have another venue for constraining the assistant outputs. Finally, structured outputs can help in forbidding specific terms too.

You can zap them with RL.
> Training LLMs is closer, conceptually, to raising children than to implementing regexp parsers

Well then that's terrifying - the problem with children is that you can raise them perfectly and they still end up psychos. That's mostly limited by the fact that we can't raise many children and humans are pretty limited in the damage they can do.

But AI scales infinitely, and if we give it access to too much stuff then the damage it could do could be human race ending.

It might make a comeback when we finally get good at teachning AI what's real and what's imagined and also logical reasoining. I think it does moral evaluation of actions mostly well already (bacause humans are not great at it anyways). Then a rule like "don't harm humans" might suffice.
The AI has a huge problem with knowledge of the real because, unlike humans struggling with the question of whether the universe might be a simulation or we might be a brain in a jar dreaming that we're human, the AI is a simulation and it is a brain in a jar. It cannot prod the real universe to determine what's real.
I’m not sure we will ever get good at teaching them to distinguish reality from imagination. Feels like there are too many generative models pushing everything from fake songs to fake video clips.
We can’t even do it ourselves. People live in their own “truth”.
All I know is when I asked Grok, Claude, ChatGPT, and Gemini if it believed in the Hogfather, only ChatGPT and Claude said yes.

We gotta work on getting the other models to agree.

> the whole process of building AI looks very different from what we pictured back then.

Right, and so do the harm risks. We need a framework centered around how humans will use AI/robots to harm each other, not how AI/robots will autonomously harm humans.

Why so? Even for simpler and better-understood machines, autonomous harm is a critical part of the safety framework. We wouldn't declare a steel mill to be safe just because there's lots of safeguards against humans intentionally using the machines to harm each other.
AI being weaponized by people is the obvious and bigger risk but sure, there could be other types of harm I'm not focused on.
Not hand waving, Asimov’s three laws are not a good framework. My claim is that the whole point was so that Asimov could write entertaining stories about the ambiguities and edge cases of the three laws.
This is a pretty good example of what parent comment was referencing, I think.

You say "Asimov’s three laws are not a good framework.", then don't present any arguments to why it is not a good framework. Instead you bring up something separate: the framework can facilitate story writing.

It could be good for story writing and a good framework. Those two aren't mutually exclusive things. (I'm not arguing that it is a good framework or not, I haven't thought about it enough)

Right, in particular Asimov is not presenting a detailed framework of any kind.

His laws are constraints, they don’t talk about how to proceed. It’s assumed that robots will work toward goals given them, but what are the constraints?

People now who want to talk about alignment seem to want to avoid talk of constraints.

Because people themselves are not aligned. To push alignment is avoiding the issue that alignment is vague and the only close alignment we can be assured of is alignment with the goals of the company.

Spot on.

At some point I tried to figure out where the term "alignment" came from. I didn't find any definitive source, but it seems to have originated on a medium.com blog of Paul Christiano:

https://ai-alignment.com/ai-safety-vs-control-vs-alignment-2...

Basically, certain people are dismissing decades of deep though on this subject from writers (like Asimov and Sheckley), scholars (like Postman) and technologists (like Wiener). Instead, they are creating a completely new set of terms, concepts and though experiments. Interestingly, this new system seems to make important parts of the question completely implicit, while simultaneously hyper-focusing public attention on meaningless conundrums (like the infamous paperclip maximizer).

In my view, the most important thing about the three laws of robotics is that they made it obvious that there are several parties involved in AI ethics questions. There is the manufacturer/creator of the system, the user/owner of the system and the rest of the society. "Alignment" cleverly distracts everyone from noticing the distinctions between these groups.

I think it's fair to point out that they were never intended to be a good framework for aligning robots and humans. Even in his own stories they lead to problems. They were created precisely to make the point that encoding these things in rules is hard.

As for practical problems they are extremely vague. What counts as harm? Could a robot serve me a burger and fries if that isn't good for my health? By the rules they actually can't even passively allow me to get harmed so should they stop me from eating one? They have to follow human orders but which human? What if orders conflict?

> I think it's fair to point out that they were never intended to be a good framework for aligning robots and humans. Even in his own stories they lead to problems. They were created precisely to make the point that encoding these things in rules is hard.

That seems like the biggest point missed here. They're intended to be able to lend themselves to "surprising" conclusions, which is exactly what we don't want, so it seems obvious to me that those laws aren't good enough? That's how I remember the stories at least.

This seems very much a “did you even read the book??” moment. That Asimov’s laws didn’t work, and indeed failed spectacularly, was kinda the whole point.
The most obvious evidence that Asimov's three laws are not a good framework is the fact that they are not a framework, they are a plot device. Isaac Asimov was a professor of biochemistry, he had no clue about how robots or AI might actually work. The robots in his stories have "positronic brains" because positrons at the time were newly discovered and sounded cool.

They aren't simply "good for story writing," their entire narrative purpose is to be flawed, and to fail in entertaining ways. The specific context in which the three laws are employed in stories is relevant, because they are a statement by the author about the hubris of applying overly simplistic solutions to moral and ethical problems.

And the assumptions that the three laws are based on aren't even relevant to modern AI. They seem to work in universe because the model of AI at the time was purely rational, logical and strict, like Data from Star Trek. They fail because robots find logical loopholes which may violate the spirit of the laws but still technically apply. It's essentially a math problem, rather than a moral or ethical problem, whereby the robots find a novel set of variables letting them balance the equation in ways that lead to amoral or immoral consequences.

But modern LLMs aren't purely rational, logical and strict. They're weird in ways no one back in Asimov's day would ever have expected. LLMs (appear to) lie, prevaricate, fabricate, express emotion and numerous other behaviors that would have been considered impossible for any hypothetical AI at the time. So even if the three laws were a valid framework for the kinds of AI in Asimov's stories, they wouldn't work for modern LLMs because the priors don't apply.

This would probably be better suited under the original comment so that the original commenter has a better chance of seeing/reading it.
The burden of proof is obviously on anyone who wants to argue that the three laws are, in fact, a good solid framework for robot ethics. It's pretty astonishing that the three laws are taken by anyone as being some sort of canonical default framework.

Asimov was not in the "try to come with a good framework for robot ethics" business. He was in the business of trying to come up with some simple, intuitive idea that didn't require the readers to have a degree in ethics and that was broken and vague enough to have a plenty of counterexamples to make stories about.

In short, Asimov absolutely did not propose his framework as an actually workable one, any more than, say, Atwood proposed the Gilead as a workable framework for society. They were nothing but story premises that the consequences of which the respective authors wanted to explore.

>The burden of proof is [...]

Sometimes we can just talk about things without having to pretend we're in a court of law or defending our phd thesis.

Original commenter wasn't asking for anyone to prove anything, or trying to prove anything themselves. They just observed that some conversations are hand-waved away.

Given the total vagueness of the three laws idea and how Asimov came up with the idea because he wanted something easily broken to be used as a plot device, the perfectly reasonable stance is to not take them seriously a priori. Anyone is totally within their rights to think about them more and present for discussion some more solid ethical framework based on them. But I'd rather AI ethicists focused on frameworks that had some finite probability of actually working.

Given that we've been thinking about ethics for thousands of years, and haven't really made much progress, I think it's pretty clear that anything that can be condensed into three sentences is not a workable model.

Asimov's Three Laws of Robotics were explicitly designed to be a good basis for fiction that shows how Asimov's Three Laws of Robotics break down.

Suggesting they be used as a basis for actual AI ethics is...well, it's not quite to the level of creating the Torment Nexus from acclaimed sci-fi novel "Don't Create the Torment Nexus", but it's pretty darn close.

It's kinda hilarious that people are explicitly trying to build a future based on (mostly dystopian) scifi, which was the point of the torment nexus thing. But then when scifi argues for constraints on technology the argument is "those are just stories."
The argument isn't "those are just stories" it's that "those stories demonstrate why those constraints won't work."

But people are going to try it anyway. Belief in Asimov's three laws is a matter of religious faith. Just know you've been warned.

Trying to have constraints and then failing is arguably better than the current idea about AI safety - discarding constraints as a concept.

If Asimov's laws don't work, that doesn't mean we can ignore the idea of them and... Just do nothing.

>If Asimov's laws don't work, that doesn't mean we can ignore the idea of them and... Just do nothing.

I don't think anyone is suggesting to just do nothing because Asimov's laws won't work, so much as suggest that people consider why they wouldn't work, and what that means for the problem of AI alignment in the real world.

It may simply be inevitable that AI (if we're defining AI as something like an LLM) can always be talked into or out of anything, given the right prompts. In which case constraints can only work so far and we need to consider what happens when they inevitably fail.

If you think Asimov proposed the three laws as anything like a workable constraint framework of some sort, you're hilariously mistaken and probably haven't read a single Robot book in your life. Asimov came up with them BECAUSE they were a) simple, b) vague, and c) broken. BECAUSE he wanted to write stories about all the specific ways they were broken.
> I would love to see a good reasoned argument there.

"we want money from selling weapons"

Asimov’s laws of robotics were a literary device. The plot of his robot stories usually revolved around someone finding a way to get the robot to inadvertently break the laws.