Hacker News new | ask | show | jobs
Come down a rabbit hole with OpenAI o1 system card (portkey.ai)
1 points by roh26it 550 days ago
1 comments

The most interesting findings (at least from my perspective) - OpenAI measured "honesty" that could become an interesting model eval others should adopt. - Using instruction heirarchy to _avoid_ prompt injections - When pushed to pursue misaligned goals, it tried to disable oversight in 5% of tests!