| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nanoamp 2347 days ago

I can see the use-case and potential for ML in exfiltrating tables, but I'd be worried about the potential for decision-making mistakes in environments the author identifies, such as finance.

The example of TableNet using deep learning for table extraction on top of tesseract for OCR means two layers of ML, either of which could individually introduce pathologies without human oversight. It reminds me of the photocopier that changed numbers for you - https://www.theregister.co.uk/2013/08/06/xerox_copier_flaw_m...

If an ML engine was trained to be able to do things like look for totals and sub-totals in numerical tables and flag errors in summation, then that would clearly add more value in parsing for moderation (the use-case described at the end). But that doesn't seem to be something that's yet... on the table.

2 comments

fny 2347 days ago

There's a project from Microsoft Research that's really interesting which does just that:

https://www.microsoft.com/en-us/research/publication/melford...

link

nanoamp 2347 days ago

It looks like it's not quite the same thing, in that it identifies Excel values that should be formulae. It could be used in a pipeline with spreadsheets extracted by ML/OCR to reverse-engineer formulae though, which is an interesting prospect.

link

tastyminerals 2346 days ago

yes, that's why in financial domain you use rules with ML as fallback.

link