|
|
|
|
|
by yigitkonur35
641 days ago
|
|
You've got a point, but try testing it on a tricky example like the Apollo 17 document - you know, with those sideways tables and old-school writing. You'll see all three non-AI services totally bomb. Now, if you tweak it to batch = 1 instead of 10, you'll notice there's hardly any made-up stuff. When you dial down the temperature close to zero, it's super unlikely to see hallucinations with limited context. At worst, you might get some skipped bits, but that's not a dealbreaker for folks looking to feed PDFs into AI systems. Let's face it, regular OCR already messes up so much that... |
|
You wouldn’t get a markdown document automatically generated (or at least you couldn’t when I last used it a few years ago) but you did get an XML document
That XML document was actually better for our purposes because it gives you a confidence score and is properly structured, so floating frame, tables and columns would be properly structured in the output document. This reduces the risk of hallucinations.
It’s less of an out-of-the-box solution but that’s to be expected with AWS APIs.