| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by adrian_b 10 days ago
	TFA mentions that until now special very expensive hardware like Cerebras was required for reaching this kind of speeds, and it emphasizes that what is novel in their results is that they have obtained over 1000 token/s for a model with over 1 T parameters by using just standard hardware, i.e. one server with 8 GPUs.