| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nishparadox 1928 days ago

This is neat. Over Docsumo, I've had fun to build one of the pipelines [0] to extract tables from any kinds of documents. Our older pipelines use image-processing-based approaches. However, they had too much assumptions in them (for instance, header texts, column types, etc).

Now, we've moved onto to ML-based approach to train generic models that can be applied to variety of documents for table structure recognition.

[0] - https://docsumo.com/free-tools/extract-tables-from-pdf-image...