| HN Mirror

You don't typically give the intern the task to review all company communication including the messages talking about firing the intern. People seem to have lost common sense about security.

The token prediction tries to simulate (textual) behaviour, which in this case includes blackmailing when threatened to be fired. In other words, SOMEONE has selected that it should exhibit that behaviour by selecting the training data. Sure that someone likely did it by accident, because reviewing such large data sets is just impossible, but maybe that is why such a thing is incredible risky and they should be held accountable for that decision.