Not just PDFs with tables. It works on any semi-structured document with key-value pairs like invoices, purchase orders, receipts, tickets, forms, error messages, logs, etc.
The "Information Extraction from semistructured and unstructured documents" task is seeing a huge leap, just 3 years ago it was very tedious to train a model to solve a single use case. Now they all work.
But if you do make the effort to train a specialised model for a single document type, the narrow model surpasses GPT3.5 and 4.
Consulting companies are paying juniors > $150k per year to do this kind of thing. In some objective sense, it's absurd, but locally, it makes more sense to use an expensive gpu than an MBA class president. And in 10 years, everyone's phone will have that much compute anyway.
It's funny but React/Node/Electron apps will suddenly become minimalist once everyone and his brother start adding a neural model to his app that consumes 10GB of V/RAM.
You're missing the developer time. You no longer have to spend hours (or days, perhaps weeks depending on the sources) stringing together random libs, munging and cleaning data, testing, etc etc.
But I wonder how much more productive our economies could be if everyone was taught programming the same way we teach reading & writing, and open standards were ubiquitous.
> humans writing code becomes artisanal in a century.
At the pace we’re moving at now we’re talking a few decades away at the most, well within most peoples’ career span. I feel sorry for any junior coder just entering the industry.
> Coding problems have always been language problems
Pedantically, sure. The field ChatGPT is most impactfully commoditizing is low-level coding. Instead of someone giving natural language instructions to a team of humans, they're increasingly able to give them to an LLM. It's an open question how far this can scale. But we may be near the zenith of the practicality of large-scale coding expertise.
The field C is most impactfully commoditising is low-level coding. Instead of someone giving opcodes to a CPU, they’re increasingly able to give them to a compiler. It’s an open question how far this can scale. But we may be near the zenith of the practicality of large-scale coding expertise.
Pedantic, maybe, but “coding expertise” isn’t going anywhere.
If you’ve never built PDF or archive document parsing systems, you don’t know true pain.
I see it as incredible. Most PDFs that i see are basically just thin wrappers around image scans of documents that don’t exist anywhere anymore. Archives from estates, manuals, etc.
These techniques of using LLMs to clean ocr output is game changing because best in class before was human-in-the-loop systems that required huge amounts of rewriting to get useable output.
Now LLMs are unlocking for significantly cheaper previously difficult data sources for relatively cheap.
On youtube there are timer and stopwatch videos that have millions of views, people are streaming 1080p videos for something that can be implemented locally within 20 lines of code, but does it matter really, it won't make a dent on Google's revenue.
If LLMs are deployed in large enough scale, the convenience really could justify the cost.
The better version of this is using this massive LLM to _create a program_ that can then extract the same data of similar PDFs. That way the high cost is incurred only once.
The "Information Extraction from semistructured and unstructured documents" task is seeing a huge leap, just 3 years ago it was very tedious to train a model to solve a single use case. Now they all work.
But if you do make the effort to train a specialised model for a single document type, the narrow model surpasses GPT3.5 and 4.