| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vikp 898 days ago
	The size of the framework is not the most important factor - the model weights are usually 10x+ the size of the framework. The most important factor is inference speed. For something called Nitro, I really expected speed benchmarks. I'd be interested in CPU, CUDA, and MPS at different batch sizes.

1 comments

p-e-w 898 days ago

AFAICT, Nitro is just a wrapper around llama.cpp. Therefore, you can simply look at llama.cpp benchmarks, of which there are plenty.

link

UnlockedSecrets 898 days ago

Oobagooda and other front ends and similar projects have in my testing had upwards of a 50% difference in inference speed on the same model and settings, So benchmarks are still useful.

link

brucethemoose2 898 days ago

Ooba is an outlier, and has tons of overhead over llama.cpp and llama-cpp-python for some reason.

Most llama.cpp openai servers are pretty close to vanilla llama.cpp, albeit without the batching support.

link