That's a good point! We're actually working on integrating this as well, but in practice, what we've found is that LLM's in general don't like to respond with empty strings for example.
My hypothesis here is that due to RLFH, there's likely some implicit learning that tangentially related content is better than no content.
Given that, you'd likely still get better results with your schema being:
"string | null" so the LLM can output a null instead of "" since there is probably not as much training data that gives "" high log prob values.
But we're looking forward to evaluating the functions call, and seeing what the metrics show!
Constrained generation should not require calling supplemental functions. It's as simply as banning or reducing the weight of the naughty tokens. There are several libraries which enable this without function calling (microsoft guidance, jsonformer, lmql)
The output is not 100% guaranteed. Be careful about that and have another layer to check the output.
I had a schema with a string enum property to categorise some inputs. One of the category names was "media/other" or something to that effect. Sometimes the output would stop at just media even though it wasn't a valid option in the schema.
My hypothesis here is that due to RLFH, there's likely some implicit learning that tangentially related content is better than no content.
Given that, you'd likely still get better results with your schema being:
"string | null" so the LLM can output a null instead of "" since there is probably not as much training data that gives "" high log prob values.
But we're looking forward to evaluating the functions call, and seeing what the metrics show!