Hacker News new | ask | show | jobs
by ekabod 1139 days ago
If you were asked to extract lists or tables from html pages only, how would you go?

I was thinking: a) use the metric used in TableTransformer to detect the structured data. b) use the Markup LM model, maybe mixed with TableTransformer. c) find a way to work directly with GPT4.