Hacker News new | ask | show | jobs
by visarga 1124 days ago
Not just PDFs with tables. It works on any semi-structured document with key-value pairs like invoices, purchase orders, receipts, tickets, forms, error messages, logs, etc.

The "Information Extraction from semistructured and unstructured documents" task is seeing a huge leap, just 3 years ago it was very tedious to train a model to solve a single use case. Now they all work.

But if you do make the effort to train a specialised model for a single document type, the narrow model surpasses GPT3.5 and 4.