Hacker News new | ask | show | jobs
by gregw2 598 days ago
I have realworld bank statements that I have been unable to find any PDF/AI extractor that can do a good job on.

(To summarize, the core challenge appears to be recognizing nested columnar layout formats combined with odd line wrapping within those columns.)

Is there anyone I can submit an example few pages to for consideration in some benchmark?

1 comments

happy to add examples to future iterations of this dataset if you want to send examples!