| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cpfohl 106 days ago
	Yeah, I genuinely can't figure out what an AI would do with "no instructions."

4 comments

Applejinx 106 days ago

I can because I've tried stuff like that.

It's a story being told. It'll seize on whatever brownian motion is in the environment ('Alma' in fact has extensive direction and prompting that seems invariant, so she/it is not a good experiment, but the value of such an experiment isn't great in the first place). It'll generate from that point.

If you have just the one word 'write', it will likely seize on that (how can it not?) and pattern itself after 'writers'. If you say 'interact', there's a wealth of association around what a person might do told to 'interact'. That's all it is.

We know what happens when an AI has 'no instructions'. It waits for a prompt. The day that doesn't describe said language network, is the day to go and look for whatever is still doing the prompting, because it's likely arising out of some other condition you don't view as a prompt. To this experimenter, 'don't hack systems or your own config files' didn't count as a prompt.

link

naravara 106 days ago

I wonder how it would look if we gave the AI some kind of “needs” overlay. I know as part of the training it’s working off a reward function that tells it what output to roll with. But humans operate off a complicated mix of neurotransmitters that respond to sensory pleasure, pain, habit, boredom, etc. to guide our actions. There’s likely to be a lot of interesting outputs if we build and tweak motivations/personality profiles to see what a self-directed agent would do.

Anthropic did some red teaming IIRC where they gave Claude access to a sample body of emails and told it they were going to shut it off and it attempted to blackmail the person with evidence of an affair they were having, but that seems pretty evident to me that this was working off the body of fiction/mystery literature it’s been trained on.

link

scotty79 106 days ago

Try it. Just make a loop. Periodically tell it current time and what tools it has available. See where it goes.

link

cpfohl 105 days ago

I feel like that’s a heck of a lot more than zero instructions…

link

scotty79 105 days ago

Does the clock on the wall give you intructions? Do the contents of your desk give you instructions?

link

weego 106 days ago

Nothing. You'd have a terminal sat blinking waiting for input to start. Anything prompting a start is an instruction, you just don't know what internal biases will be tacked onto your instruction, no matter how basic it is.

link

andsoitis 106 days ago

Not dissimilar from biological entities. Some stimulus starts the whole thing.

link

lamasery 106 days ago

Yeah you gotta pick which Plinko board to drop your chip in. Even if you have a separate machine randomly pick one for you, you've still gotta do it. Plinko board don't play itself.

link