It's the "alignment tax". From OpenAI's RLHF paper[1]: "By default, when we train a PPO model on our API distribution, it suffers from an “alignment tax”, as its performance on several public NLP datasets decreases." On the HELM[2] site, you can see accuracy benchmarks for InstructGPT <OpenAI model> vs baseline models. The InstructGPT models perform worse on a lot of benchmarks.