Hacker News new | ask | show | jobs
by arnaudsm 1126 days ago
Using LLMs with 100GB VRAM to convert PDFs to CSVs is truly depressing, but I am sure many companies will love it.

2023 office software already uses 1000x more ressources than 1990s'. I bet we are ready to do that again.

9 comments

Not just PDFs with tables. It works on any semi-structured document with key-value pairs like invoices, purchase orders, receipts, tickets, forms, error messages, logs, etc.

The "Information Extraction from semistructured and unstructured documents" task is seeing a huge leap, just 3 years ago it was very tedious to train a model to solve a single use case. Now they all work.

But if you do make the effort to train a specialised model for a single document type, the narrow model surpasses GPT3.5 and 4.

Consulting companies are paying juniors > $150k per year to do this kind of thing. In some objective sense, it's absurd, but locally, it makes more sense to use an expensive gpu than an MBA class president. And in 10 years, everyone's phone will have that much compute anyway.
It's funny but React/Node/Electron apps will suddenly become minimalist once everyone and his brother start adding a neural model to his app that consumes 10GB of V/RAM.
You're missing the developer time. You no longer have to spend hours (or days, perhaps weeks depending on the sources) stringing together random libs, munging and cleaning data, testing, etc etc.
I agree, computers are cheapers than engineers.

But I wonder how much more productive our economies could be if everyone was taught programming the same way we teach reading & writing, and open standards were ubiquitous.

> wonder how much more productive our economies could be if everyone was taught programming

Prompt engineering is turning coding problems into language problems. It’s conceivable that humans writing code becomes artisanal in a century.

> humans writing code becomes artisanal in a century.

At the pace we’re moving at now we’re talking a few decades away at the most, well within most peoples’ career span. I feel sorry for any junior coder just entering the industry.

Coding problems have always been language problems
> Coding problems have always been language problems

Pedantically, sure. The field ChatGPT is most impactfully commoditizing is low-level coding. Instead of someone giving natural language instructions to a team of humans, they're increasingly able to give them to an LLM. It's an open question how far this can scale. But we may be near the zenith of the practicality of large-scale coding expertise.

The field C is most impactfully commoditising is low-level coding. Instead of someone giving opcodes to a CPU, they’re increasingly able to give them to a compiler. It’s an open question how far this can scale. But we may be near the zenith of the practicality of large-scale coding expertise.

Pedantic, maybe, but “coding expertise” isn’t going anywhere.

If you’ve never built PDF or archive document parsing systems, you don’t know true pain.

I see it as incredible. Most PDFs that i see are basically just thin wrappers around image scans of documents that don’t exist anywhere anymore. Archives from estates, manuals, etc.

These techniques of using LLMs to clean ocr output is game changing because best in class before was human-in-the-loop systems that required huge amounts of rewriting to get useable output.

Now LLMs are unlocking for significantly cheaper previously difficult data sources for relatively cheap.

On youtube there are timer and stopwatch videos that have millions of views, people are streaming 1080p videos for something that can be implemented locally within 20 lines of code, but does it matter really, it won't make a dent on Google's revenue.

If LLMs are deployed in large enough scale, the convenience really could justify the cost.

we also had more secretaries and people who just retyped things all day in the 90's!
It's worth double for the increase in accuracy. Don't let me go to Amazon Mechanical poor souls Turk.

https://en.wikipedia.org/wiki/Amazon_Mechanical_Turk

The better version of this is using this massive LLM to _create a program_ that can then extract the same data of similar PDFs. That way the high cost is incurred only once.