| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ComputerGuru 597 days ago
	I’d be very interested in the opposite! Lots of scanned or legacy images that would be nice to convert to LaTeX, or to create a robust PDF ingestion pipeline.

4 comments

Vetch 597 days ago

In addition to the already mentioned https://huggingface.co/facebook/nougat-base, I also highly recommend https://huggingface.co/stepfun-ai/GOT-OCR2_0. It might even be better.

link

hextex 597 days ago

Facebook's Nougat [1] should work with this, but not sure how much preprocessing is needed to yield good results with scanned copies of physical documents. Note that it outputs .mmd files (MultiMarkDown), but the equations and tables should (iirc) output plain LaTeX.

1: https://github.com/facebookresearch/nougat

link

Wdorf 596 days ago

This looks really interesting! I will definitely have a look.

link

sorenjan 597 days ago

Like Mathpix?

https://mathpix.com/

link

BeetleB 597 days ago

Mathpix is awesome. So far it has never gotten the output wrong. I even have it integrated into Emacs/org-mode.

link

Wdorf 597 days ago

I could add a GPT based pdf to latex functionality in the future.

link