Hacker News new | ask | show | jobs
by mmoustafa 88 days ago
I would love to see real-life tokens/sec values advertised for one or various specific open source models.

I'm currently shopping for offline hardware and it is very hard to estimate the performance I will get before dropping $12K, and would love to have a baseline that I can at least always get e.g. 40 tok/s running GPT-OSS-120B using Ollama on Ubuntu out of the box.

2 comments

Look for llmfit on github. This will help with that analysis. I've found it reasonably accurate. If you have Ollama already installed, it can download the relevant models directly.
For reference, 12k gets you at least 4 Strix Halo boxes each running GPT-OSS-120B at ~50tok/s.