|
|
|
|
|
by visarga
1124 days ago
|
|
Not just PDFs with tables. It works on any semi-structured document with key-value pairs like invoices, purchase orders, receipts, tickets, forms, error messages, logs, etc. The "Information Extraction from semistructured and unstructured documents" task is seeing a huge leap, just 3 years ago it was very tedious to train a model to solve a single use case. Now they all work. But if you do make the effort to train a specialised model for a single document type, the narrow model surpasses GPT3.5 and 4. |
|