Hacker News new | ask | show | jobs
by shekhar101 812 days ago
What's the name of the layout recorgniser model? I did not have a good experience extracting layout from tables, especially those without column boundaries (space instead of lines to demarcate boundaries)
2 comments

it's https://huggingface.co/InfiniFlow/deepdoc and the code for usage is in https://github.com/infiniflow/ragflow/blob/main/deepdoc/READ... – it took me a bit of trial and error to get it working

It seems to be a YOLOv8 fine-tune, I only did a couple tests but results were decent. Another model that is supposed to be fine tuned for borderless is https://huggingface.co/keremberke/yolov8m-table-extraction but I haven't had great results myself with it, but maybe worth a try for you.

Thank you very much!
Here's a quick test to run: if you have Windows and MS Office, File->Open your PDF and report the results. You might be surprised at the layout extraction quality.