Hacker News new | ask | show | jobs
by lukeinator42 922 days ago
I'd also add "Deep reinforcement learning from human preferences" https://proceedings.neurips.cc/paper_files/paper/2017/file/d... and "Training language models to follow instructions with human feedback" https://proceedings.neurips.cc/paper_files/paper/2022/file/b....

These papers outline the approach of reinforcement learning from human feedback which is being used to train lots of these LLMs such as ChatGPT.