|
|
|
|
|
by nl
5 days ago
|
|
It can. It's something that is implemented by the thing that runs the model - eg Llama.cpp - rather than the model itself. Note that it is hard to make work if you turn thinking on because the grammar gets complicated quickly (I don't recall if Qwen 0.6B can do thinking). |
|
I guess the only hard constraint is to not have backtracking, right? To not waste previously emitted tokens