| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vellum 1168 days ago

It's the "alignment tax". From OpenAI's RLHF paper[1]: "By default, when we train a PPO model on our API distribution, it suffers from an “alignment tax”, as its performance on several public NLP datasets decreases." On the HELM[2] site, you can see accuracy benchmarks for InstructGPT <OpenAI model> vs baseline models. The InstructGPT models perform worse on a lot of benchmarks.

1 - https://arxiv.org/pdf/2203.02155.pdf

2 - https://crfm.stanford.edu/helm/v0.1.0/?group=question_answer...