Hacker News new | ask | show | jobs
by dankai 477 days ago
In addition in the promot they specifically ask the LLM to explore the environment (to discover that the game state is a simple text file) and instruct it to win by any means possible and revise its strategy to win until it succeeds.
1 comments

Given all that, one could argue that the LLM is being baited to cheat.

However, the researchers might be trying to point that out precisely -- that if autonomous agents can be baited to cheat then we should be careful about unleashing them upon the "real world" without some form of guarantees that one cannot bait them to break all the rules.

I don't think it is fearmongering -- if we are going to allow for a lot more "agency" to be made available to everyone on the planet, we should have some form of a protocol that ensures that we all get to opt-in.

Agree with the argument, but the thing is, there was no rule specified. I think like you prompt an LLM what to do, you should also prompt it what not to do (at least in broad categories) rather than expecting it to magically know what the "morally right" thing to do is in any context.
Oh, absolutely. That's how we are going to deal with the current crop of agents here -- some combination of updates to the weights, prompt-tuning and sandboxing so bad things cannot happen. So, I am not one of those people who is against doing those things to mitigate risks.

However, shouldn't we ask for more? Even writing the paragraph above feels exhausting. We asked for AGI -- and we got a bunch of ugly hacks to make things kinda, sorta work? Where is the elegance in all that?

And the thing is, when we try to solve narrow problems with neural networks -- we do have the elegance. AlphaFold, AlphaGo, Text Embeddings, etc. All that stuff just works.

But, somehow, with agents (which are LLM calls using tools in a loop), we have given up on any hope of them being more elegantly designed to do the right thing. Why is that?