Very cool! I've done something similar for improving an OCR system on crinkled paper[0]. Blender is a powerful and totally underutilized tool for this kind of work
Uou this is awesome! And it's very nicely presented in the website. I'm wondering how you mapped from the UV to the 3D model. I would like to add that feature to the addon.
TLDR: using a KD-tree, I find the face containing the UV coordinate. Then I transform the UV coordinate to barycentric coordinates within that containing face, then put that barycentric coordinate through the local -> world -> view -> perspective transform matrices
A common approach in rendering engines to convert screen space coordinates to objects is to render a second image with light and shadow disabled where the color uniquely maps to an id. You then can uniquely identify 24 bits worth of objects without needing to maintain a KD tree.