Y
Hacker News
new
|
ask
|
show
|
jobs
by
zone411
622 days ago
They have a cloud platform. I just ran a test query on their version of Llama 3.1 70B and got 566 tokens/sec.
1 comments
greesil
622 days ago
Is that a lot? Do they have MLPerf submissions?
link
zone411
622 days ago
Yes, that's very fast. The same query on Groq, which is known for its fast AI inference, got 249 tokens/s, and 25 tokens/s on Together.ai. However, it's unclear what (if any) quantization was used and it's just a spot check, not a true benchmark.
https://www.zdnet.com/article/cerebras-did-not-spend-one-min...
link
Tetraslam
622 days ago
Met them at an MIT event last week, they don't quantize any models.
link