| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bongodongobob 718 days ago
	You don't actually think the LLM is reviewing those 30k documents do you? You tell it to write a program (which is easy to audit) to pull the info from the PDFs or whatever. I don't get why this crowd is so goddamn unimaginative with LLMs.

2 comments

svieira 718 days ago

> You tell it to write a program (which is easy to audit) to pull the info from the PDFs

Wherein you discover that unless you ask it to consider the fact that PDFs are ... very hard to parse [1] [2] you get something that misses whole blocks of text or turns them into something they aren't and the rest of the program misses chunks of the document.

[1]: https://news.ycombinator.com/item?id=22473263 [2]: https://web.archive.org/web/20200303102734/https://www.filin...

link

bongodongobob 718 days ago

Why are you expecting they are all very different? They're all likely very similar.

link

svieira 717 days ago

Because presuming that all of them are produced by the same utility is a _presumption_. They could be - but they could also be produced by many different vendors using many different methods all of them simply conforming to the specification "a PDF with HIGH LEVEL DESCRIPTION OF THE DATA".

link

goatlover 718 days ago

Because I've heard of enough lazy uses of LLMs to be suspicious. Auditing the program means being sure that the info pulled from those documents is reviewed properly. Also, a complete lack of regard for other people's privacy.

link

bongodongobob 718 days ago

No idea where privacy enters in here.

link