| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by svachalek 472 days ago
	Yeah the state of the art is pretty awful. There have been multiple incidents where a model has been dropped on ollama with the wrong chat template, resulting in it seeming to work but with greatly degraded performance. And I think it's always been a user that notices, not the ollama team or the model team.

1 comments

refulgentis 472 days ago

I'm grateful for anyone's contributions to anything, but I kinda shake my head about ollama. the reason stuff like this happens is they're doing the absolute minimal job necessary, to get the latest model running, not working.

I make a llama.cpp wrapper myself, and it's somewhat frustrating putting effort in for everything from big obvious UX things, like error'ing when the context is too small for your input instead of just making you think the model is crap, to long-haul engineering commitments, like integrating new models with llama.cpp's new tool calling infra, and testing them to make sure it, well, actually works.

I keep telling myself that this sort of effort pays off a year or two down the road, once all that differentiation in effort day-to-day adds up. I hope :/

link

Karrot_Kream 472 days ago

Can you link your wrapper? I've read and run up against a lot of footguns related to Ollama myself and I think surfacing community efforts to do better would be quite useful.

link

refulgentis 472 days ago

Cheers, thanks for your interest:

Telosnex, @ telosnex.com --- fwiw, general positioning is around paid AIs, but there's a labor-of-love llama.cpp backed on device LLM integration that makes them true peers, both in UI and functionality. albeit with a warning sign because normie testers all too often wander into trying it on their phone and killing their battery.

My curse is the standard engineer one - only place I really mention it is one-off in comments like here to provide some authority on a point I want to make...I'm always one release away from it being perfect enough to talk up regularly.

I really really need to snap myself awake and ban myself from the IDE for a month.

But this next release is a BFD, full agentic coding, with tons of tools baked in, and I'm so damn proud to see the extra month I've spent getting llama.cpp tools working agentically too. (https://x.com/jpohhhh/status/1897717300330926109, real thanks is due to @ochafik at Google, he spent a very long term making a lot of haphazard stuff in llama.cpp coalesce. also phi-4 mini. this is the first local LLM that is reasonably fast and an actual drop-in replacement for RAG and tools, after my llama.cpp patch)

Please, feel free to reach out if you try it and have any thoughts, positive or negative. james @ the app name.com

link