| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mrcggl 1086 days ago
	A large H100 cluster (>10k GPUs) could likely train a LLM with 10x compute (FP8) of GPT-4, which was apparently trained on a mix of A100s and V100s.