Got it, thanks. Certainly a very interesting and active space. I was playing around with FLARE (https://arxiv.org/abs/2305.06983) for RAG this week, and LMQL (mentioned by another poster) seems to use a similar technique.
In response to your sister comment: the implementation we used was the naive one from LangChain (https://python.langchain.com/docs/modules/chains/additional/...). We've decomposed that to use as a starting point but early results are promising, yes, although it doesn't yet seem to be possible to get the necessary `logprobs` out of the GPT-4 API, so we're stuck with 3.5-turbo atm.