Hacker News new | ask | show | jobs
by myrmidon 315 days ago
I think a big factor in Asimov's laws specifically being sidelined is that the whole process of building AI looks very different from what we pictured back then.

Instead of us programming the AIs by feeding it lots of explicit hand-crafted rules/instructions, we're feeding the things with plain data instead, and the resulting behavior is much more black-box, less predictable and less controllable than anticipated.

Training LLMs is closer, conceptually, to raising children than to implementing regexp parsers, and the whole "small simple set of universal constraints" is just not really applicable/useful.

9 comments

Isn't this worse, though? You said it yourself, it's an even blacker black box that nobody truly understands. And we even have an attempt at setting rules for these things akin to Asimov's laws, those "system prompts" that people are fascinated by are testament to that and they tend to be THOUSANDS of statements long (and they're not very good at following them).

People often misunderstand Asimov's laws, the entire point of the laws and the stories they're set in was that you can't just throw a simple "Don't hurt people" clause at a black box like AI and expect good results. You first have to define "Don't", then you have to define "hurt" and perhaps the hardest of all is you have to define "people". And I mean really define it, to the smallest most minute detail of what exactly all those words mean. Otherwise you very quickly run into funny, tragic and even contradictory situations, and those situations are endlessly unique.

Is feeding grossly unhealthy food to a starving person harm? Perhaps not, you can argue it's better to eat something unhealthy than to starve. What about feeding someone on the brink of a cardiac arrest that same meal? Now what about all the other gray areas involved here, you have to define every single possible situation in which an unhealthy meal might affect someone.

It's kinda funny, because it really is almost prophetic considering it's a story written quite a long time before we were even close to it being a reality...

My point is that past expectations about AI where that it would be possible/most viable to set hard, explicit constraints on behavior, like how a stream of instructions constrains the behavior of a CPU.

There simply IS no explicit definition for "people", "hurt" or "don't" inside an LLM that you could found such hard constraints on.

Note that we never found a way to "program" such constraints into a human mind either, we probably/hopefully never will, and I think that whole approach ("simple, hard deterministic constraints") is just never gonna work for AI; so Asimovs rule framework is just not really applicable.

Raising children involves a whole lot of simple constraints that you gradually relax.

“Don’t touch the knife” becomes “You can use _this_ knife, if an adult is watching,” which becomes “You can use these knives but you have to be careful, tell me what that means” and then “you have free run of the knife drawer, the bandages are over there.” But there’s careful supervision at each step and you want to see that they’re ready before moving up. I haven’t seen any evidence of that at all in LLM training—it seems to be more akin to handing each toddler every book ever written about knives and a blade and waiting to see what happens.

Asimov didn't describe robots programmed using formal logic. All robots in Asimov's stories had "positronic brains" that were described as being quite humanlike and unpredictable. His stories all revolve around this: the 3 laws are intentionally vague and open to interpretation, allowing non-deterministic or surprising outcomes. Not so different to LLMs.
>Training LLMs is closer, conceptually, to raising children than to implementing regexp parsers, and the whole "small simple set of universal constraints" is just not really applicable/useful.

That this can be said, and there still being so doubt we should ramp up the Ethics research before going and rawdogging the implementation just bloody bewilders me.

> Instead of us programming the AIs by feeding it lots of explicit hand-crafted rules/instructions, we're feeding the things with plain data instead, and the resulting behavior is much more black-box, less predictable and less controllable than anticipated.

I dunno, we do feed them lots of explicit hand-crafted rules/instructions, it's just that does don't go into the training process, but instead goes into the "system"/"developer" prompts, which is effectively the way you "program" the LLMs.

So you start out with nothing, adjust the weights based on the datasets until you reach something that allows you to "program" them via the system/developer prompts, which considering what's happening behind the scenes, is more controllable than expected.

Yes, but those hand-crafted rules are just input data, they don't actually constrain the behavior, they are just an attempt.

Similarly to how verbal instruction works with a child: You can tell it not to touch the hot stove, but the child still might try.

> they don't actually constrain the behavior

They do actually constraint the behavior, to various degrees of success which depends on the model, the system prompt, the inference parameters, the current context length and a lot more. Add in the new `developer` role and you have another venue for constraining the assistant outputs. Finally, structured outputs can help in forbidding specific terms too.

You can zap them with RL.
> Training LLMs is closer, conceptually, to raising children than to implementing regexp parsers

Well then that's terrifying - the problem with children is that you can raise them perfectly and they still end up psychos. That's mostly limited by the fact that we can't raise many children and humans are pretty limited in the damage they can do.

But AI scales infinitely, and if we give it access to too much stuff then the damage it could do could be human race ending.

It might make a comeback when we finally get good at teachning AI what's real and what's imagined and also logical reasoining. I think it does moral evaluation of actions mostly well already (bacause humans are not great at it anyways). Then a rule like "don't harm humans" might suffice.
The AI has a huge problem with knowledge of the real because, unlike humans struggling with the question of whether the universe might be a simulation or we might be a brain in a jar dreaming that we're human, the AI is a simulation and it is a brain in a jar. It cannot prod the real universe to determine what's real.
I’m not sure we will ever get good at teaching them to distinguish reality from imagination. Feels like there are too many generative models pushing everything from fake songs to fake video clips.
We can’t even do it ourselves. People live in their own “truth”.
All I know is when I asked Grok, Claude, ChatGPT, and Gemini if it believed in the Hogfather, only ChatGPT and Claude said yes.

We gotta work on getting the other models to agree.

> the whole process of building AI looks very different from what we pictured back then.

Right, and so do the harm risks. We need a framework centered around how humans will use AI/robots to harm each other, not how AI/robots will autonomously harm humans.

Why so? Even for simpler and better-understood machines, autonomous harm is a critical part of the safety framework. We wouldn't declare a steel mill to be safe just because there's lots of safeguards against humans intentionally using the machines to harm each other.
AI being weaponized by people is the obvious and bigger risk but sure, there could be other types of harm I'm not focused on.