|
|
|
|
|
by kuter
962 days ago
|
|
Took a peek at the models they use. It seems to be a vision transformer encoder decoder architecture with a resent backbone. Looks really good. I had a similar idea of training a model and making a desktop application, but haven't had the opportunity. I wonder how much compute it took to train the model. I think this paper was the first one to do OCR on LaTeX: http://cs231n.stanford.edu/reports/2017/pdfs/815.pdf
The paper describes an Encoder-Decoder architecture with CNN encoder and LSTM based decoder. |
|
Some recent cool work he's been doing: https://www.youtube.com/watch?v=lx1XcTdhalU.