Have you found the output for arbitrary grammars to be satisfactory? My naive assumption has been that these models will produce better JSON than other formats simply by virtue of having seen so much of it.
If you want to get a good result, the grammar should be following the expect output from the prompt, especially if you use a small model. Normally I would manually fine-tune the prompt to output the grammar format first, and then apply the grammar in production.