| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jchw 4 hours ago

My guess is that the humans they had in the loop for RLHF just simply preferred the code because it looked superficially tidier. I have the strangest feeling that they didn't always have top notch engineers in the loop at all steps.

I suspect this is also how LLM prose gets so utterly bad.

Of course, for indentation and ASCII graphics, it would be less prone to breaking constantly in this way if it were not a next-token predictor.