Hacker News new | ask | show | jobs
by rtrgrd 325 days ago
I thought human preferences was typically considered a noisy reward signal
1 comments

If it was just "noisy", you could compensate with scale. It's worse than that.

"Human preference" is incredibly fucking entangled, and we have no way to disentangle it and get rid of all the unwanted confounders. A lot of the recent "extreme LLM sycophancy" cases is downstream from that.