Hacker News new | ask | show | jobs
by UnlockedSecrets 899 days ago
Oobagooda and other front ends and similar projects have in my testing had upwards of a 50% difference in inference speed on the same model and settings, So benchmarks are still useful.
1 comments

Ooba is an outlier, and has tons of overhead over llama.cpp and llama-cpp-python for some reason.

Most llama.cpp openai servers are pretty close to vanilla llama.cpp, albeit without the batching support.