Y
Hacker News
new
|
ask
|
show
|
jobs
by
ilaksh
525 days ago
I assume people are aware, but Cerebras has a web demo and API which is open to try and it is 2000 tokens per second for Llama 3.3 70b and 1000 tokens per second for Llama 3.1 405b.
https://cerebras.ai/inference