Hacker News new | ask | show | jobs
by andai 5 days ago
Fantastic. Could you share more details what it was like post-training a model?
1 comments

The RL is easy to describe, hard to do. The nice thing about pen testing is the reward isn't a vibe like training for code quality, the exploit either lands or it doesn't. The day to day is not glamorous at all, mostly fighting for stable gpu access, watching a cluster sit half-idle with nodes you somehow can't book.