| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dakshgupta 419 days ago
	This specific case each file had a single bug in it, and the bot was instructed to find exactly one bug. The wrong cases were all false positives, in that it made up a bug

1 comments

dheera 418 days ago

I think this is mostly the fault of RLHF over-indexing on pleasing the user rather than being right.

You can system prompt them to mitigate this to some degree. Explicitly tell it that it is the coding expert and to push back if it thinks the user is wrong or the task is flawed, it is better to be unsure than to bullshit, etc.

link

dakshgupta 418 days ago

This is surprisingly hard to mitigate with system prompts because not being opinionated is ingrained so deeply in (presumably) post-training

link