Y
Hacker News
new
|
ask
|
show
|
jobs
by
geerlingguy
558 days ago
It's a little under 1 token/sec using ollama, but that was with stock llama.cpp — apparently Ampere has their own optimized version that runs a little better on the AmpereOne. I haven't tested it yet with 405b.