|
|
|
|
|
by gharman
1148 days ago
|
|
Indeed! I built a system just last year with - count em - three parsers to deal with PDF table extraction, including one built on TableTransformer. And then when GPT4 came out I just copy pasted a PDF into it as-is and darned if it didn’t do at least as good a job. Now I can’t do this in earnest because of document privacy issues but I’ve diving down the rabbit hole of how small can we go and still get decent results. Spoiler: gpt2 is too small. :-) |
|
I was thinking: a) use the metric used in TableTransformer to detect the structured data. b) use the Markup LM model, maybe mixed with TableTransformer. c) find a way to work directly with GPT4.