| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wrs 741 days ago
	The big news for me here is the 16k output token limit. The models keep increasing the input limit to outrageous amounts, but output has been stuck at 4k. I did a project to summarize complex PDF invoices (not “unstructured” data, but “idiosyncratically structured” data, as each vendor has a completely different format). GPT-4o did an amazing job at the extraction of line items, but I had to do a heuristic layer on top to break up the PDFs into small chunks so the output didn’t overflow.

4 comments

wrs 741 days ago

My excitement is now tempered a bit. I just tried one of the too-big invoices with the new model. After successfully getting a little farther than 4o could do, it just went into an endless loop of repeating the same line item until it ran out of output tokens. So…not really an improvement!

link

film42 741 days ago

This has been my experience with any model with a large response token limit. I've had to work around this by running it through several times with specific questions about the data: extract text, extract tables, extract <specific detail>. They seem to do well on large input though so I just concat all the extracted info and things seem to work just fine.

link

mukhtharcm 737 days ago

Did you got any different experience later on?

link

delichon 741 days ago

If all that AI could do was to turn less than structured data into structured data, it would still be the biggest deal in computation since the transistor.

link

jascha_eng 741 days ago

But only if it could do it with reasonable accuracy. The problem is that AI is one of the few technologies that doesn't just fail to do it's job but it fails and you might never notice until the error is already very costly if it hallucinated something crazy.

link

monkeydust 741 days ago

Surely this is still a massive problem for any real world enterprise use case unless you throw a human in the loop (which kills the productivity benefit) or you stamp a massive disclaimer on the output

link

wrs 741 days ago

Well, this thing I’m doing isn’t good enough for an audit or the like, but it’s good enough for sanity checking the budget and flagging things for further checking. And without the AI, you just wouldn’t do it at all, because it would take weeks to write a “parser” for these PDFs.

Actually, it doesn’t even need PDFs. It works just about as well if you just feed it PNGs of the pages. Crazy.

link

GaggiX 741 days ago

>AI is one of the few technologies that doesn't just fail to do it's job but it fails and you might never notice until the error is already very costly if it hallucinated something crazy.

Because this is what is used to deal with non-formal and unstructured data, if you build something that would be always accurate to the task, then you would have solved it formally.

link

raxxorraxor 740 days ago

Giving an LLM any task involving numbers is quite a gamble. Still, I guess structuring content is exactly where I assume many practical applications lie, perhaps just as a preprocessor. You just need a way to validate the results...

link

sanmon3186 740 days ago

>I had to do a heuristic layer on top to break up the PDFs into small chunks so the output didn’t overflow

How do you stitch the outputs of all chunks without losing the overall context?

link

wrs 738 days ago

The output is just individual line items from the invoices, so all you have to do is concatenate the outputs of the chunks. If there was data that crossed a page, it would have been harder!

link

bronco21016 741 days ago

Have you written about this anywhere? Would love to know more about the process you're using!

link