Hacker News new | ask | show | jobs
by sabareesh 542 days ago
You might find more information here helpful https://sabareesh.com/posts/llm-intro/ But i am still in process of evaluating post training process with RL. RLHF is almost a mirage that shows what is possible but not the full capability of what model can do