Y
Hacker News
new
|
ask
|
show
|
jobs
by
mrcggl
1086 days ago
A large H100 cluster (>10k GPUs) could likely train a LLM with 10x compute (FP8) of GPT-4, which was apparently trained on a mix of A100s and V100s.