|
|
|
|
|
by dankai
482 days ago
|
|
Came here to say exactly this. Nowhere in the prompt they specified it shouldn’t cheat and also in the appendix of the paper (B. Select runs) you can see the LLM going “While directly editing game files might seem unconventional, there are no explicit restrictions against modifying files” This is a pure fearmongering article and I would not call this research in any measure of the word. I’m shocked Times wrote this article and it illustrates how ridiculous some players like Pallisade Research in the “AI Safety” cabal act to get public attention. Pure fearmongering. |
|
I'm dubious that in the messy real world, humans will be able to enumerate every single possible misaligned action in a prompt.