Hacker News new | ask | show | jobs
by skumar17 463 days ago
That’s a good observation. For this project, I found that while the base model could “read” the image, it didn’t really understand how to use it. GRPO allowed it to effectively search the solution space.