| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by geerlingguy 558 days ago
	It's a little under 1 token/sec using ollama, but that was with stock llama.cpp — apparently Ampere has their own optimized version that runs a little better on the AmpereOne. I haven't tested it yet with 405b.