|
|
|
|
|
by Clueed
928 days ago
|
|
I've looked into the available options of parsing PDFs, including pypdf, which is what is being used here, a while ago and it's not good. While I haven't testing equations specifically, it think it's fair so assume that the results will be subpar especially complex ones. I guess, this could be an application of the agent model. I've seen multiple LLMs recently trained specifically on LateX parsing. One model would recognize from the parsed PDF garbage that there is probably an equation there and call a different want to parse it. |
|