I imagine that validation as you go could slow things down though.
Do I look at whether the data format is easily output by my target LLM?
Or do I just validate clamp/discard non-conforming output?
Always using the latter seems pretty inefficient.