Hacker News new | ask | show | jobs
by Cyan488 44 days ago
What about when the model trusts itself more than the "black box" you gave it, and hallucinates its use or non-use in favor of reimplementation? I found this video about "intelligent disobedience" interesting.

https://www.youtube.com/watch?v=Qu-00j9XuF0