| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by UnlockedSecrets 899 days ago
	Oobagooda and other front ends and similar projects have in my testing had upwards of a 50% difference in inference speed on the same model and settings, So benchmarks are still useful.

1 comments

Ooba is an outlier, and has tons of overhead over llama.cpp and llama-cpp-python for some reason.

Most llama.cpp openai servers are pretty close to vanilla llama.cpp, albeit without the batching support.