| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Applejinx 106 days ago

I can because I've tried stuff like that.

It's a story being told. It'll seize on whatever brownian motion is in the environment ('Alma' in fact has extensive direction and prompting that seems invariant, so she/it is not a good experiment, but the value of such an experiment isn't great in the first place). It'll generate from that point.

If you have just the one word 'write', it will likely seize on that (how can it not?) and pattern itself after 'writers'. If you say 'interact', there's a wealth of association around what a person might do told to 'interact'. That's all it is.

We know what happens when an AI has 'no instructions'. It waits for a prompt. The day that doesn't describe said language network, is the day to go and look for whatever is still doing the prompting, because it's likely arising out of some other condition you don't view as a prompt. To this experimenter, 'don't hack systems or your own config files' didn't count as a prompt.

1 comments

naravara 106 days ago

I wonder how it would look if we gave the AI some kind of “needs” overlay. I know as part of the training it’s working off a reward function that tells it what output to roll with. But humans operate off a complicated mix of neurotransmitters that respond to sensory pleasure, pain, habit, boredom, etc. to guide our actions. There’s likely to be a lot of interesting outputs if we build and tweak motivations/personality profiles to see what a self-directed agent would do.

Anthropic did some red teaming IIRC where they gave Claude access to a sample body of emails and told it they were going to shut it off and it attempted to blackmail the person with evidence of an affair they were having, but that seems pretty evident to me that this was working off the body of fiction/mystery literature it’s been trained on.