[1] https://www.johnsnowlabs.com/spark-ocr/
[2] https://www.adobe.io/apis/documentcloud/dcsdk/pdf-extract.ht...