Hacker News new | ask | show | jobs
Train a LLM from Scratch (github.com)
3 points by linhns 52 days ago
1 comments

Curious — how did you handle training stability early on? Was convergence an issue without heavy tuning?