Hacker News new | ask | show | jobs
Hosted Microsoft OCR library: Free OCR API web service (blog.a9t9.com)
72 points by kargo 3895 days ago
12 comments

Never used an OCR library before so gave it a challenge with the latest xkcd cartoon.

curl https://ocr.a9t9.com/api/Parse/Image --data "apikey=helloworld&url=http://imgs.xkcd.com/comics/bells_theorem.png" { "ParsedResults": [ { "FileParseExitCode": 1, "ParsedText": "T-as IS CALLED I THEOREM. IT LAS FIRST\u2014 t: 1 wostcop n, FASTER- FIRM-LIGHT com)MCBT10N IS E6SlBLE! EL's rta\u201eNDERSTRtDlNGS ELLS THEX\u00dcI I-IPPPEN VIOLATE LOCALITY", "ErrorMessage": "", "ErrorDetails": "" } ], "OCRExitCode": 1, "IsErroredOnProcessing": false, "ErrorMessage": null, "ErrorDetails": null }

Doesn't look like it likes xkcd hand lettering or the font that approximates it. I had better luck with a this...

curl https://ocr.a9t9.com/api/Parse/Image --data "apikey=helloworld&url=http://www.uky.edu/Providers/ScannedText/page1s.jpeg" {"ParsedResults":[{"FileParseExitCode":1,"ParsedText":"In 1830 there were but twenty-three \r\nmiles of railroad in operation in the \r\nUnited States, and in that year Ken- \r\ntucky took the initial step in the work \r\nwest of fhe Alleghanies. An Act to \r\nincorporate the Lexington & Ohio \r\nRailway Company was approved by \r\nGov. Metcalf, Jarinary 27, 1830.. It \r\nprovided for the construction and reÔÇó \r\n","ErrorMessage":"","ErrorDetails":""}],"OCRExitCode":1,"IsErroredOnProcessing":false,"ErrorMessage":null,"ErrorDetails":null}

Same image in free-ocr.com returned :(

BELL'5 SECONDTI-IEOREI’I: WWINGSGWTW HPPPBI‘BOFHSFHRFTHEYVIOLMELDCNJTY.

Using the web site for a quick test, it gets me better results than Tesseract. However, it missed some words that free-ocr.com gets every single time. (free-ocr.com seems to have some voodoo magic)
Isn't this a violation of MS's EULA?
"One user may install and use copies of the software to design, develop, test and demonstrate your programs. You may not use the software on a server in a production environment."

License: http://www.microsoft.com/web/webpi/eula/windows_runtime_ocr_...

Ouch. I was not aware of this, so thanks for the info! I guess the reason for this surprisingly restrictive license is/was the version 1/first release character of the software (namespace Windows"Preview".Media.Ocr).

The good news: In Win 10 the separate library is gone and the OCR feature is a regular part of Windows (Windows.Media.Ocr namespace). Along with this, the separate OCR runtime license is gone. -> I could not find any hint that the new OcrEngine class (or Windows Store apps in general!) have similar "no server use" restrictions -> I will move the OCR app to a Win 10 platform asap.

And while I can not speak for Microsoft, I have good reasons to assume that the ocr api service is doing Microsoft a favor by advertising the great Win 10 OCR features. My web service allows for quick prototyping and testing on any platform. But ultimately no web api can be as responsive as a native OCR solution - which is only available on the Windows platform.

I would not be surprised if the OCR engine shows up in Windows Server 2016, directly usable from ASP.NET.

Update: I confirmed that Microsoft's OCR.dll is indeed part of Windows Server 2016. More info: http://blog.a9t9.com/2015/10/microsoft-ocr-on-windows-server...
And right below that:

"ADDITIONAL LICENSING REQUIREMENTS AND/OR USE RIGHTS."

Which defines how you can use it in non-development and testing purposes.

Of course, the following clause is just as damning:

"iii. Distribution Restrictions. You may not"

"distribute Distributable Code to run on a platform other than the Windows Store or Windows Phone;"

This clause does not apply here: I assume it is intended to avoid "hacked" OCR libraries that e. g. work with Win32 apps. But as with any hosted service, I do not distribute any code.
Were those terms presented as part of the offer before money (or other consideration) changed hands?

If not, who cares what an EULA says, it's not a contract.

I tried this API with a document I had lying around which contains a lot of text in different tables. The text is pretty clear but this API was not able to parse the text to the point it's usable.

To give an example, a part of the text read "the limit" and was parsed as as "he imit". This despite it being extremely clear / easy to read for a human.

Update: Took another picture and uploaded a JPG instead of the original PDF. It worked fine this time.

Can anybody recommend any good open source OCR libraries that run under *nix?
I recently started playing with tesseract.

Here's a dockerfile that will install it in a minimal alpine image.

https://github.com/wartron/docker-tesseract

Not a library but gocr has always been useful for me
I got this error; are you a victim of your own popularity?

    {"ParsedResults":[{"FileParseExitCode":-20,"ParsedText":"","ErrorMessage":"Timed out waiting for image parsing result or error generation by OCR","ErrorDetails":"System.TimeoutException: Timed out waiting for image parsing result or error generation by OCR\r\n   bei OCRInteractionLibrary.OCRInteractor.GetResultForImage(String tempPath, String imageName, FileInfo imageFileInfo, Boolean isOverlayRequired, AccesorType accesorType) in d:\\1tmp\\OCRReaderSolution914\\OCRReaderSolution\\OCRInteractionLibrary\\OCRInteractor.cs:Zeile 259."}],"OCRExitCode":3,"IsErroredOnProcessing":false,"ErrorMessage":null,"ErrorDetails":null}
Yep. -> Fixed.
Is it general knowledge that Microsofts OCR libs are better than Tesseract?
Does this imply you think Microsoft>Tesseract or are you asking what the consensus is?
I'm asking what the consensus is. In fact this is the first I've heard of the Microsoft offering.
I'm really not a fan of the random highlighting of text, it just feels incredibly childish and slightly crazy.
I wonder if the Microsoft OCR uses Stroke Width Transform - http://digital.cs.usu.edu/~vkulyukin/vkweb/teaching/cs7900/P...
Does anyone know who or what organization is behind A9T9? It seems the author/source is somewhat obscured. It would be important to know before using for more than a hobby project.
a9t9 here :-)

You find some information about me on http://blog.a9t9.com/p/about-this-blog.html

(a9t9) is a place where I experiment with side projects, and I want to keep this separate from my day job. Therefore I am somewhat stingy with the personal information on the blog. That said, you are welcome to email me for details.

Is there some pre-processing done on the image prior to doing ocr on them and is this using Tesseract?
> is this using Tesseract

No, it isn't.

how can you create an API around this microsoft ocr library so I can just call localhost:3424/api/Parse/Image ?