Hacker News new | ask | show | jobs
by subtick 53 days ago
Curious — how did you handle training stability early on? Was convergence an issue without heavy tuning?