Hacker News new | ask | show | jobs
by sushid 704 days ago
Is that not just traditional OCR applied on top of LLM?
2 comments

It's possible they have a software layer that does that. But I was assuming they don't, because the open source multimodal models don't.
No it’s not, it’s a multimodal transformer model.