| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fred123 683 days ago
	macOS Live Text is incredible. Mac only though

2 comments

eigenvalue 683 days ago

Yes, I imagine it's using the same OCR model as the iPhone, which is really incredibly good. In fact, it's so good that I made a little app for fun just to be able to use it for OCRing whole PDF books:

https://apps.apple.com/us/app/super-pdf-ocr/id6479674248

link

kergonath 683 days ago

Interesting! I’ll give it a try, I have a couple of large books to OCR (to be honest, the name in all caps with underscores is not really encouraging).

From your experience, how does the OCR engine work with multiple-columns documents?

link

eigenvalue 683 days ago

The iOS app would likely not handle two-column text very well. I really made the iOS app on a lark for personal use, the whole thing took like 2 hours, and I'd never even made a Swift or iOS app before. It actually took longer to submit it to the App Store than it did to create it from scratch, because all the hard stuff in the app uses built-in iOS APIs for file loading, PDF reading, screenshot extraction, OCR, NLP for sentence splitting, and sharing the output.

I think the project I submitted here would do that better, particularly if you revised the first prompt to include an instruction about handling two column text (like "Attempt to determine if the extracted text actually came from two columns of original text; if so, reformat accordingly.")

The beauty of this kind of prompt engineering code is that you can literally change how the program works just by editing the text in the prompt templates!

link

kergonath 683 days ago

Thanks, I’ll try to play with this. Thanks also for keeping us updated, your work is very interesting!

link

wahnfrieden 683 days ago

Sadly no bounding rects

link

fred123 683 days ago

You can get them through the Vision API (Swift/Objective-C/AppleScript)

link

_boffin_ 682 days ago

You’re forgetting about Python and TypeScript/JavaScript. PyObjC and whatever it is for TypeScript.

link

wahnfrieden 683 days ago

Yes but it's relatively shit

The Vision API can't even read vertical Japanese text

link

fred123 683 days ago

Fair enough. There are some new OCR APIs in the next macOS release. I wonder if the model has been improved.

link

wahnfrieden 682 days ago

They're just a new Swift-only interface to the same underlying behaviors, no apparent improvement. I was hoping for more given the visionOS launch but alas

What I'm trying now is combining ML Kit v2 with Live Text - Apple's for the accurate paragraphs of text, and then custom indexing that against the ML Kit v2 output to add bounding rects and guessing corrections for missing/misidentified parts from ML Kit (using it only for bounding rects and expecting it will make mistakes on the text recognition)

I also investigated private APIs for extracting rects from Live Text. It looks possible, the APIs are there (it has methods or properties which give bounding rects as is obviously required for Live Text functionality), but I can't wrap my head around accessing them yet.

link

fred123 682 days ago

I feel like text detection is much better covered by the various ML models discussed elsewhere in the comments. Maybe you can combine those with Live Text. I found Tesseract pretty ok for text detection as well but I don’t know if any of the models are good for vertical text.

link