Hacker News new | ask | show | jobs
by paldepind2 81 days ago
Sorry if this is a basic question, but what's you workflow for feeding the papers into the LLM and getting the implementation done? The coding agents that I've used are not able to read PDFs, so I've been wondering how to do it.
1 comments

this is actually a great question - I just extract the text with PyPDF, but did a brief search on the functionality I'd like to have (convert math equations to LaTeX, extract images, reformat in markdown, extract data from charts) and it looks like there are a couple of promising Python libs like Docling and Marker.. I should really improve this part of my workflow.
after looking into it for a little while, Docling and Marker work pretty well but are very slow. I haven't found anything else that extracts math suitably. It takes 10+ minutes per pdf, so I'm going to run it on a batch of these papers overnight and create my own little gaussian splatting RAG database. It's really too bad PDF is so terrible.