| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Nimsical 3287 days ago
	This is cool! Wondering what you're using for OCR?

1 comments

jffry 3287 days ago

  For developers: Copyfish is published under the
  GPL open-source license. As OCR software, it uses
  the free OCR API from https://ocr.space/

link

whitten 3287 days ago

So, to answer the question mentioned above, the document storing the text is sent to an off-site server (https://ocr.space/) which does the OCR and returns the results.

link

tobltobs 3287 days ago

And what lib is using ocr.space for OCR?

link

tangue 3287 days ago

I suspect they're using Tesseract as they've written a gui for it ( https://ocr.space/blog/p/free-ocr-windows.html ) but there's no way to find more.

link

samfisher83 3287 days ago

https://github.com/A9T9/Free-OCR-Software

Based on this github they might be using the microsoft ocr library.

link

PokemonNoGo 3287 days ago

I guess it auto defaults to English then? Running Tesseract on Scandinavian texts gives AAO instead of ÅÄÖ in my experience if you don't supply the correct language training set. That's quite the hen and the egg problem. Can't language identify without the text can't get the text without the right language identified.

link