| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mmoustafa 133 days ago
	I would love to see real-life tokens/sec values advertised for one or various specific open source models. I'm currently shopping for offline hardware and it is very hard to estimate the performance I will get before dropping $12K, and would love to have a baseline that I can at least always get e.g. 40 tok/s running GPT-OSS-120B using Ollama on Ubuntu out of the box.

2 comments

hpcjoe 133 days ago

Look for llmfit on github. This will help with that analysis. I've found it reasonably accurate. If you have Ollama already installed, it can download the relevant models directly.

link

atwrk 132 days ago

For reference, 12k gets you at least 4 Strix Halo boxes each running GPT-OSS-120B at ~50tok/s.

link