| HN Mirror

I worked on this a bit 1-2 years ago. Back then, LLMs weren't really up to the task, but I found them OK for suggestions that a human double checks. Brings us to the Ironies of Automation though (human oversight of automation with a review process doesn't really work, it's a paper worth reading).

We tried several dedicated services for extracting structured data and factoids like that from documents: First Google Document AI, then a dedicated provider focusing solely on our niche. Back then, that gave the best results.

There wasn't enough budget to go deeper into this and we just reverted to doing it manually. But I think a really cool way to do this would be to make a user friendly UI where they can see suggestions and the text snippets they were extracted from as they skim through the document, with a simple way to modify and accept these. I think that'd work to scale the process quite a bit. Focusing the attention of the human at the relevant parts of the document basically.

Haven't worked on this space since then, but I'm pretty bearish on fully automated fact extraction. Getting stuff in contracts and invoices wrong is typically not acceptable. I think a solid human in the loop approach is probably still the way to go.