I wonder if you could do a composite image, like bracketed images, and so give the model multiple goes, for which it could amalgamate results. So, you could do an exposure bracket, do a focus/blur, maybe a stretch/compression, or an adjustment for font-height as a proportion of the image.
Feed all of the alternatives to the model, tell it they each have the same textual content?
Feed all of the alternatives to the model, tell it they each have the same textual content?