Hacker News new | ask | show | jobs
by jchw 4 hours ago
My guess is that the humans they had in the loop for RLHF just simply preferred the code because it looked superficially tidier. I have the strangest feeling that they didn't always have top notch engineers in the loop at all steps.

I suspect this is also how LLM prose gets so utterly bad.

Of course, for indentation and ASCII graphics, it would be less prone to breaking constantly in this way if it were not a next-token predictor.