Hacker News new | ask | show | jobs
by ocrcustomserver 2523 days ago
"The problem is that while Textract works really well for well defined tabular data it does not work for tables where the rows and columns are implied with white space, instead of lines."

This is what Tabula and Camelot call "Stream" and "Lattice" parsing methods.

Whitespace between cells is Stream, demarcated lines is Lattice.