Hacker News new | ask | show | jobs
by mountainriver 423 days ago
I felt like this was already known right? My understanding was always that the base model had all the paths and RL was learning to navigate them