Hacker News new | ask | show | jobs
by cpfohl 58 days ago
Yeah, I genuinely can't figure out what an AI would do with "no instructions."
4 comments

I can because I've tried stuff like that.

It's a story being told. It'll seize on whatever brownian motion is in the environment ('Alma' in fact has extensive direction and prompting that seems invariant, so she/it is not a good experiment, but the value of such an experiment isn't great in the first place). It'll generate from that point.

If you have just the one word 'write', it will likely seize on that (how can it not?) and pattern itself after 'writers'. If you say 'interact', there's a wealth of association around what a person might do told to 'interact'. That's all it is.

We know what happens when an AI has 'no instructions'. It waits for a prompt. The day that doesn't describe said language network, is the day to go and look for whatever is still doing the prompting, because it's likely arising out of some other condition you don't view as a prompt. To this experimenter, 'don't hack systems or your own config files' didn't count as a prompt.

I wonder how it would look if we gave the AI some kind of “needs” overlay. I know as part of the training it’s working off a reward function that tells it what output to roll with. But humans operate off a complicated mix of neurotransmitters that respond to sensory pleasure, pain, habit, boredom, etc. to guide our actions. There’s likely to be a lot of interesting outputs if we build and tweak motivations/personality profiles to see what a self-directed agent would do.

Anthropic did some red teaming IIRC where they gave Claude access to a sample body of emails and told it they were going to shut it off and it attempted to blackmail the person with evidence of an affair they were having, but that seems pretty evident to me that this was working off the body of fiction/mystery literature it’s been trained on.

Try it. Just make a loop. Periodically tell it current time and what tools it has available. See where it goes.
I feel like that’s a heck of a lot more than zero instructions…
Does the clock on the wall give you intructions? Do the contents of your desk give you instructions?
Nothing. You'd have a terminal sat blinking waiting for input to start. Anything prompting a start is an instruction, you just don't know what internal biases will be tacked onto your instruction, no matter how basic it is.
Not dissimilar from biological entities. Some stimulus starts the whole thing.
Yeah you gotta pick which Plinko board to drop your chip in. Even if you have a separate machine randomly pick one for you, you've still gotta do it. Plinko board don't play itself.