| HN Mirror

I did read that comment. I don't think that person is saying they were part of the study that OpenAI used to evaluate the models. They would probably know if they had gotten paid to evaluate LLM responses.

But I'm glad you pointed that out, I now suspect that is responsible for a large part of the disagreement between "huh? a statistically significant blind evaluation is a statistically significant blind evaluation" vs "oh, this was obviously a terrible study" repliers is due to different interpretations of that post. Thanks. I genuinely didn't consider the alternative interpretation before.