| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nextaccountic 5 days ago
	> The model invents new categories (e.g. apartments) and doesn’t stick to the provided list of allowed categories Can this specific failure mode be solved by providing a grammar that the output must adhere to? (Not sure if Qwen has this feature, it's used for eg. to ensure the output is parseable json)

3 comments

nl 5 days ago

It can.

It's something that is implemented by the thing that runs the model - eg Llama.cpp - rather than the model itself.

Note that it is hard to make work if you turn thinking on because the grammar gets complicated quickly (I don't recall if Qwen 0.6B can do thinking).

link

nextaccountic 4 days ago

Just one question. If I'm running a local model, can I do something other than just a context free grammar? Does it makes sense to have something more general, or it would be just too slow?

I guess the only hard constraint is to not have backtracking, right? To not waste previously emitted tokens

link

aesthesia 4 days ago

Thinking shouldn't be too hard to deal with---just let the model generate freely until it hits a </think> token, then do constrained decoding, right?

link

stymaar 4 days ago

Sure, but does llama-cpp support that?

link

nl 4 days ago

It does and this is how I did it.

But actually getting that grammar right as well as actually making it work with the correct Jinja template to correctly enable thinking mode and parse it out was a lot more work than I expected.

link

thomascountz 4 days ago

Yes, you can use constrained decoding like logit masking to force all invalid tokens in the vocabulary to -inf, and effectively be removed from selection. I believe llama.cpp exposes this by accepting a formatted grammar.

link

mijoharas 4 days ago

This was my thought as well. I'm surprised that it's not being used here (afaict)

link