Y
Hacker News
new
|
ask
|
show
|
jobs
by
sushid
704 days ago
Is that not just traditional OCR applied on top of LLM?
2 comments
energy123
704 days ago
It's possible they have a software layer that does that. But I was assuming they don't, because the open source multimodal models don't.
link
maxlamb
703 days ago
No it’s not, it’s a multimodal transformer model.
link