Sending images through that API and then using an LLM to extract data from the text result from the OCR could be worth exploring.