What did you use to extract the embedded text during this step? Other than some other OCR tech
[1] https://huggingface.co/blog/manu/colpali