Hacker News new | ask | show | jobs
by clarionbell 352 days ago
No they don't. Why would they? Most of them are using a single inference engine, most likely developed inhouse. Or they go for something like vLLM, but llama.cpp especially is under their radar.

The reason is simple. There isn't much money in it. llama.cpp is free and targets lower end of the hardware spectrum. Corporations will run something else, or even more likely, offload the task to contractor.

1 comments

The chat template issues are actually not on llama.cpp's side, but on all engines (including vLLM, SGLang etc) For eg see https://www.reddit.com/r/unsloth/comments/1l97eaz/deepseekr1... - which fixed tool calling for DeepSeek R1