| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by astrange 1249 days ago

I meant the blog post.

https://openai.com/blog/chatgpt/

> The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.12

> Stiennon, Nisan, et al. “Learning to summarize with human feedback.” Advances in Neural Information Processing Systems 33 (2020): 3008-3021. ↩

> Gao, Leo, John Schulman, and Jacob Hilton. “Scaling Laws for Reward Model Overoptimization.” arXiv preprint arXiv:2210.10760 (2022). ↩