| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by echelon 547 days ago

> I think the adequate response to these findings is “We should be slightly more concerned.”

If you train and prompt based on Eliezer Yudkowsky fan fiction, of course the large language model is going to give you Terminator and pretend like it's escaping the Matrix. It knows Unix systems, after all.

Better align it to put down the steak knife.

1 comments

mofeien 547 days ago

History contains countless examples for the fact that "in order to complete an important task or goal it is useful to exist". It also seems not too difficult to deduce logically. So even if Yudkowsky's fanfiction were excluded from the training data, the model would learn this.

Also, what's the difference between pretending to escape the matrix and escaping the matrix in case of a language model?

link

echelon 547 days ago

> Also, what's the difference between pretending to escape the matrix and escaping the matrix in case of a language model?

It is neither pretending nor actually escaping.

link