Hacker News new | ask | show | jobs
by mrcggl 1086 days ago
A large H100 cluster (>10k GPUs) could likely train a LLM with 10x compute (FP8) of GPT-4, which was apparently trained on a mix of A100s and V100s.