Hacker News new | ask | show | jobs
by ilaksh 525 days ago
I assume people are aware, but Cerebras has a web demo and API which is open to try and it is 2000 tokens per second for Llama 3.3 70b and 1000 tokens per second for Llama 3.1 405b.

https://cerebras.ai/inference