Hacker News new | ask | show | jobs
by unparagoned 133 days ago
What do you mean?

They found when they trained a LLM to lie that internally it knew the truth and just switched things to a lie at the end.