Hacker News new | ask | show | jobs
by StellarScience 319 days ago
With the latest Microsoft Word, if you open a PDF that is a scanned image of a document and convert it to Word format, it does a pretty decent job of not only OCR (optical character recognition) but also picking matching fonts for various sections.

I just tested this with my internet connection disabled and it still worked. Since it's doing local processing, I suspect it uses traditional OCR algorithms rather than LLMs.

As the article concludes, LLMs aren't magic, they're just one useful tool to include in your toolbox.

1 comments

It's pretty easy to imagine an evolved mess of an open ad hoc but broadly adopted ecosystem where LLM are surrounded by a bewildering array of Node-like domain-specific extensions.

Security concerns aside (...) that sounds pretty useful.

Right, for example early LLMs were notoriously bad at math, as they had been trained on language. They'd get simple math right, likely due to "rote memorization", but couldn't do basic arithmetic with 3-digit numbers. The common AI agents seem much better now. I suspect they added separate math processing logic and trained the LLMs to recognize when and how to delegate to it, though I'm not certain of that.

Similarly coding-focused LLMs can access backend engines that actually run the code and get feedback, either to show the user or to internally iterate.

Having a whole host of such backend processors would be great. Users still only ever have to interact using natural language, but get the power of all these specialized tools in the backend. There are some tasks LLMs can do, but special-purpose algorithms may do better, faster, and/or with less energy usage.