Hacker News new | ask | show | jobs
by zardinality 544 days ago
In the introduction of the paper it says: "Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks." They have indeed a very strong infra team.
1 comments

Do we have two completely different definitions of “infrastructure”?