Hacker News new | ask | show | jobs
by jerkstate 76 days ago
after looking into it for a little while, Docling and Marker work pretty well but are very slow. I haven't found anything else that extracts math suitably. It takes 10+ minutes per pdf, so I'm going to run it on a batch of these papers overnight and create my own little gaussian splatting RAG database. It's really too bad PDF is so terrible.