Hacker News new | ask | show | jobs
by constantinum 641 days ago
There is also LLMWhisperer, a document pre-processor specifically made for LLM consumption.

As other mentioned, accuracy is the one part of solution criteria, other include, how does the preprocessing engine scale/performs at large scale, and how does it handle very complex documents like, bank loan forms with checkboxes, IRS tax forms with multi-layered nested tables etc.

https://unstract.com/llmwhisperer/

LLMWhisperer is a part of Unstract - An open-source tool for unstructured document ETL.

https://github.com/Zipstack/unstract