Hacker News new | ask | show | jobs
by bongodongobob 718 days ago
You don't actually think the LLM is reviewing those 30k documents do you? You tell it to write a program (which is easy to audit) to pull the info from the PDFs or whatever. I don't get why this crowd is so goddamn unimaginative with LLMs.
2 comments

> You tell it to write a program (which is easy to audit) to pull the info from the PDFs

Wherein you discover that unless you ask it to consider the fact that PDFs are ... very hard to parse [1] [2] you get something that misses whole blocks of text or turns them into something they aren't and the rest of the program misses chunks of the document.

[1]: https://news.ycombinator.com/item?id=22473263 [2]: https://web.archive.org/web/20200303102734/https://www.filin...

Why are you expecting they are all very different? They're all likely very similar.
Because presuming that all of them are produced by the same utility is a _presumption_. They could be - but they could also be produced by many different vendors using many different methods all of them simply conforming to the specification "a PDF with HIGH LEVEL DESCRIPTION OF THE DATA".
Because I've heard of enough lazy uses of LLMs to be suspicious. Auditing the program means being sure that the info pulled from those documents is reviewed properly. Also, a complete lack of regard for other people's privacy.
No idea where privacy enters in here.