Hacker News new | ask | show | jobs
by Der_Einzige 559 days ago
Not even close. Llama.cpp isn't even close to a production ready LLM inference engine, and it runs overwhelmingly faster when using CUDA