|
|
|
|
|
by echelon
547 days ago
|
|
> I think the adequate response to these findings is “We should be slightly more concerned.” If you train and prompt based on Eliezer Yudkowsky fan fiction, of course the large language model is going to give you Terminator and pretend like it's escaping the Matrix. It knows Unix systems, after all. Better align it to put down the steak knife. |
|
Also, what's the difference between pretending to escape the matrix and escaping the matrix in case of a language model?