Hacker News new | ask | show | jobs
LLM from scratch, part 32k – Interventions: gradient accumulation (gilesthomas.com)
2 points by gpjt 61 days ago