| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tracerbulletx 859 days ago
	It's a different inference engine with different capabilities. It should be a lot faster on Nvidia cards. I don't have comp benchmarks for llama.cpp but if you find some compare them to this. https://nvidia.github.io/TensorRT-LLM/performance.html https://github.com/lapp0/lm-inference-engines/