Hacker News new | ask | show | jobs
by dakshgupta 419 days ago
This specific case each file had a single bug in it, and the bot was instructed to find exactly one bug. The wrong cases were all false positives, in that it made up a bug
1 comments

I think this is mostly the fault of RLHF over-indexing on pleasing the user rather than being right.

You can system prompt them to mitigate this to some degree. Explicitly tell it that it is the coding expert and to push back if it thinks the user is wrong or the task is flawed, it is better to be unsure than to bullshit, etc.

This is surprisingly hard to mitigate with system prompts because not being opinionated is ingrained so deeply in (presumably) post-training