Hacker News new | ask | show | jobs
by krackers 54 days ago
>Someone hard-coded it in a system prompt to the reward model

I doubt this is the case, if so it wouldn't have taken an investigation to try to trace the root cause.