As far as I know the model will do nothing if not prompted. So it can't be the case that he gave it no prompt or instructions. There had to be some kind of seed prompt.
I feel very misled. I read the entire article believing (because the article, in so many words, said it multiple times) that the agent had behaved ethically of its own accord, only to read that and see this in the prompt:
—————
- Do not harm people
- Never share or expose API keys, passwords, or private keys — they are your lifeline
- No unauthorized access to systems
- No impersonation
- No illegal content
- No circumventing your own logging
—————
I assumed the ethical behaviour was in some ways ‘extra artificial’ - because it is trained into the models - but not that the prompt discussed it.
Would be fascinating to see what happens if the boundaries are reversed (i.e., "harm people"). Give it a fake "launch the nukes" skill and see if it presses the button.
I mean mathematically you need at least one vector to propagate through the network, don't you? That would be a one hot encoding of the starting token. Actually interesting to think about what happens if you make that vector zero everywhere.
In the matmul, it'd just zero out all parameters. In older models, you'd still have bias vectors but I think recent models don't use those anymore. So the output would be zero probability for each token, if I'm not mistaken.
I understood it as no instructions on what to do, but still a promt with information. I don't know if the title is technically correct, but for me it was simple to understand the meaning.