Hacker News new | ask | show | jobs
by lrei 929 days ago
No, that's not what I meant. I meant that in its reinforcement learning phase, GPT saw examples of "fix this text" style requests and was rewarded for doing a good job. That's different from seeing examples of typos and still predicting the right word which happens during the language model self supervised training. Both likely help it be good at it.