Hacker News new | ask | show | jobs
by wrs 694 days ago
The big news for me here is the 16k output token limit. The models keep increasing the input limit to outrageous amounts, but output has been stuck at 4k.

I did a project to summarize complex PDF invoices (not “unstructured” data, but “idiosyncratically structured” data, as each vendor has a completely different format). GPT-4o did an amazing job at the extraction of line items, but I had to do a heuristic layer on top to break up the PDFs into small chunks so the output didn’t overflow.

4 comments

My excitement is now tempered a bit. I just tried one of the too-big invoices with the new model. After successfully getting a little farther than 4o could do, it just went into an endless loop of repeating the same line item until it ran out of output tokens. So…not really an improvement!
This has been my experience with any model with a large response token limit. I've had to work around this by running it through several times with specific questions about the data: extract text, extract tables, extract <specific detail>. They seem to do well on large input though so I just concat all the extracted info and things seem to work just fine.
Did you got any different experience later on?
If all that AI could do was to turn less than structured data into structured data, it would still be the biggest deal in computation since the transistor.
But only if it could do it with reasonable accuracy. The problem is that AI is one of the few technologies that doesn't just fail to do it's job but it fails and you might never notice until the error is already very costly if it hallucinated something crazy.
Surely this is still a massive problem for any real world enterprise use case unless you throw a human in the loop (which kills the productivity benefit) or you stamp a massive disclaimer on the output
Well, this thing I’m doing isn’t good enough for an audit or the like, but it’s good enough for sanity checking the budget and flagging things for further checking. And without the AI, you just wouldn’t do it at all, because it would take weeks to write a “parser” for these PDFs.

Actually, it doesn’t even need PDFs. It works just about as well if you just feed it PNGs of the pages. Crazy.

>AI is one of the few technologies that doesn't just fail to do it's job but it fails and you might never notice until the error is already very costly if it hallucinated something crazy.

Because this is what is used to deal with non-formal and unstructured data, if you build something that would be always accurate to the task, then you would have solved it formally.

Giving an LLM any task involving numbers is quite a gamble. Still, I guess structuring content is exactly where I assume many practical applications lie, perhaps just as a preprocessor. You just need a way to validate the results...
>I had to do a heuristic layer on top to break up the PDFs into small chunks so the output didn’t overflow

How do you stitch the outputs of all chunks without losing the overall context?

The output is just individual line items from the invoices, so all you have to do is concatenate the outputs of the chunks. If there was data that crossed a page, it would have been harder!
Have you written about this anywhere? Would love to know more about the process you're using!