Hacker News new | ask | show | jobs
by p-e-w 898 days ago
AFAICT, Nitro is just a wrapper around llama.cpp. Therefore, you can simply look at llama.cpp benchmarks, of which there are plenty.
1 comments

Oobagooda and other front ends and similar projects have in my testing had upwards of a 50% difference in inference speed on the same model and settings, So benchmarks are still useful.
Ooba is an outlier, and has tons of overhead over llama.cpp and llama-cpp-python for some reason.

Most llama.cpp openai servers are pretty close to vanilla llama.cpp, albeit without the batching support.