Hacker News new | ask | show | jobs
by kuter 962 days ago
Took a peek at the models they use. It seems to be a vision transformer encoder decoder architecture with a resent backbone. Looks really good. I had a similar idea of training a model and making a desktop application, but haven't had the opportunity. I wonder how much compute it took to train the model.

I think this paper was the first one to do OCR on LaTeX: http://cs231n.stanford.edu/reports/2017/pdfs/815.pdf The paper describes an Encoder-Decoder architecture with CNN encoder and LSTM based decoder.

1 comments

Want to give proper credit to my former student for starting this: Yuntian Deng et al., 2016 (https://arxiv.org/abs/1609.04938). I believe this repo uses the dataset from that paper.

Some recent cool work he's been doing: https://www.youtube.com/watch?v=lx1XcTdhalU.