Hacker News new | ask | show | jobs
by eigenvalue 506 days ago
Fair enough, but that still uses a lot more memory during training than what DeepSeek is doing.