| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ilaksh 525 days ago
	I assume people are aware, but Cerebras has a web demo and API which is open to try and it is 2000 tokens per second for Llama 3.3 70b and 1000 tokens per second for Llama 3.1 405b. https://cerebras.ai/inference