Hacker News new | ask | show | jobs
by dheera 418 days ago
I think this is mostly the fault of RLHF over-indexing on pleasing the user rather than being right.

You can system prompt them to mitigate this to some degree. Explicitly tell it that it is the coding expert and to push back if it thinks the user is wrong or the task is flawed, it is better to be unsure than to bullshit, etc.

1 comments

This is surprisingly hard to mitigate with system prompts because not being opinionated is ingrained so deeply in (presumably) post-training