Hacker News new | ask | show | jobs
by roborovskis 554 days ago
https://stable-baselines3.readthedocs.io/en/master/ is a great resource for hacking on implementations for RL - many good RL courses out there but https://www.youtube.com/playlist?list=PLwRJQ4m4UJjNymuBM9Rdm... is my personal favorite.

For LLMs / RLHF it's a little more difficult but https://github.com/huggingface/alignment-handbook and the Zephyr project is a good collection of model / dataset / script that is easy to follow.

I would suggest studying the basics of RL first before diving into LLM RLHF, which is much harder to learn on a single GPU.

1 comments

Hi, the Zephyr link may be what I'm looking for. yeah I'm quite familiar with RL already so it was specifically RLHF that I was asking about, I'll check out that resource, thanks!