Hacker News new | ask | show | jobs
by rvnx 705 days ago
In reality probably authors of the papers understood that the OpenAI team artificially readjusted these biases through RLHF and that there is nothing to find there, except that it still works when the words are written with typos because no manual examples of “redressing biases” have been provided with such typos.
1 comments

If they were so clever about it, surely they would have taken the pride to mention this in the study.