Hacker News new | ask | show | jobs
by fnordpiglet 446 days ago
I use tesseract which uses a LTSM OCR along with multimodal LLMs to converge to a ground truth. It works remarkably well. However for my purposes I don’t want a LLM explaining charts I want it to produce a vector format of the chart. There are a few models that produce Latex chart formats I’m experimenting with:

https://arxiv.org/pdf/2405.15306

Most OCR pipelines like this, along with excellent commercial ones like doctly.ai, are focused on OCR for LLM consumption - while I’d like to be able to recreate the original scientific work that predates digital typesetting in modern typeset - for yes LLM but also to preserve and promote science of yore, much of which includes discoveries forgotten but relevant still to problems we face today.