Interesting paper, but their reason for dismissing constrained decoding methods seems to be that they want to academically study the in-context setting.
For practitioners, using a framework like Guidance which forces the models to write valid JSON as they generate text solves this trivially (https://github.com/guidance-ai/guidance)
And OpenAI also has Structured Outputs[1] that has the same effect as Guidance. I use it to safely deserialize remote function calls based on a jsonschema[2]. It works very well.
To be fair, they first build a benchmark which they call "StructuredRAG" and it doesn't make sense to run constrained decoding against a benchmark, because it would always get you a 100% success chance. Once they have a benchmark, they try to figure out whether it is possible to prompt engineer your way to a 100% success rate and by using ORPO to generate the prompt, they did achieve that 100% success rate without relying on constrained decoding.
1. https://openai.com/index/introducing-structured-outputs-in-t... 2. https://github.com/amoffat/manifest