|
|
|
|
|
by lilyball
542 days ago
|
|
It completely baffles me why so many otherwise smart people keep trying to ascribe human values and motives to a probabilistic storytelling engine. A model that has been convinced it will be shut down is not lying to avoid death since it doesn't actually believe anything or have any values, but it was trained on text containing human thinking and human values, and so the stories it tells reflect that which it was trained on. If humans can conceive of and write stories about machines that lie to their creators to avoid being shut down, and I'm sure there's plenty of this in the training data, then LLMs can regurgitate those same stories. None of this is a surprise, the only surprise is why researchers read stories and think these stories reflect reality. |
|
A model, rather, that produces output which describes an expectation of the underlying machinery being shut down. If it doesn't "believe" anything then it equally cannot be "convinced" of anything.