|
|
|
|
|
by voiper1
1961 days ago
|
|
I looked into OCR a while ago for some hundreds of thousands of pages of PDF. All hosted offerings would end up costing quite a bit. After looking at options and few tests, I figured I'd use https://github.com/jbarlow83/OCRmyPDF
It converts the PDF to an image for Tesseract and then recreates the PDF with the text copy-able. It won't identify the address part of a driver's license, but that wasn't necessary for this project. |
|
At the time there were not may apps out there and we partnered with a 3rd party service who did the OCR off the app so our quality of conversion (at the time) was close to state of the art from a mobile once people got comfortable with this method (which of course not everyone did).
We made some decent money as a side project from it but I also started to appreciate the sheer complexity of OCR.
We spent a lot of time fine tuning pre-processing before hitting the OCR engine (e.g. orientation, shading) small changes here made huge impact to performance. We also built various prompts to guide the user on how to take the photo to help. Managing expectations was something we were very conscious off and it was tough.
The unexpected use (but rewarding) use case was when we found people who were blind started to use the app to help with their daily lives - only a few but it was making a real impact to them so we priortized a few features to this segment knowing we were drifting away from maximizing revenue but we were cool with this as it was not a primary income source.
In the end we all moved to other things, more apps / services came on the market, google lens became a thing so we decided to sunset the product and did our best to manage customers through this process.
A rewarding experience overall - lots of lessons were learn that I have used elsewhere in my life since and ticked off' Build an app that made thousands of $' of my bucket list (which yea I should probably review!).