Hacker News new | ask | show | jobs
by cafard 2339 days ago
I was very impressed with "Camelot" (https://camelot-py.readthedocs.io/en/master/). My impression was that it extracted maybe 80 or 90% of the text properly, far better than anything else I had tried.