| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lukeinator42 922 days ago
	I'd also add "Deep reinforcement learning from human preferences" https://proceedings.neurips.cc/paper_files/paper/2017/file/d... and "Training language models to follow instructions with human feedback" https://proceedings.neurips.cc/paper_files/paper/2022/file/b.... These papers outline the approach of reinforcement learning from human feedback which is being used to train lots of these LLMs such as ChatGPT.