| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hamdingers 216 days ago
	Am I missing it or is there no information about performance? Looking for a tokens/sec

2 comments

Right now I get 59 tok/sec on GPT-OSS 120B using Unsloth's dynamic 4-bit quants, via llama.cpp https://news.ycombinator.com/item?id=45881049

He didn't give that info but the transcript linked at the end shows how much time was spent for each query.