| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by visarga 678 days ago

What is your metric and score? Maybe you have reached perfect reliability, but in my experience information extraction is about 90% accurate for real life scenarios, and you can't reliably know which 90%.

In critical scenarios companies won't risk using 100% automation, the human is still in the loop, so the cost doesn't go down much.

I work on LLM based information extraction and use my own evaluation sets. That's how I obtained the 90% score. I tested on many document types. It looks like it's magic when you try an invoice in GPT-4o and skim the outputs, but if you spend 15 minutes you find issues.

Can you risk an OCR error confusing a dot for a comma to send 1000x more money in a bank transfer, or to get the medical data extraction wrong and someone could suffer because there was no human in the document ingestion pipeline to see what is happening?