These papers outline the approach of reinforcement learning from human feedback which is being used to train lots of these LLMs such as ChatGPT.