Hacker News new | ask | show | jobs
by coder543 72 days ago
There are issues with the chat template right now[0], so tool calling does not work reliably[1].

Every time people try to rush to judge open models on launch day... it never goes well. There are ~always bugs on launch day.

[0]: https://github.com/ggml-org/llama.cpp/pull/21326

[1]: https://github.com/ggml-org/llama.cpp/issues/21316

2 comments

What causes these? Given how simple the LLM interface is (just completion), why don't teams make a simple, standardized template available with their model release so the inference engine can just read it and work properly? Can someone explain the difficulty with that?
The model does have the format specified but there is no _one_ standard. For this model it’s defined in the [ tokenizer_config.json [0]. As for llama.cpp they seem to be using a more type safe approach to reading the arguments.

[0] https://huggingface.co/google/gemma-4-31B-it/blob/main/token...

Hm, but surely there will be converters for such simple formats? I'm confused as to how there can be calling bugs when the model already includes the template.
was just merged
It was just an example of a bug, not that it was the only bug. I’ve personally reported at least one other for Gemma 4 on llama.cpp already.

In a few days, I imagine that Gemma 4 support should be in better shape.