| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sushid 751 days ago
	Is that not just traditional OCR applied on top of LLM?

2 comments

It's possible they have a software layer that does that. But I was assuming they don't, because the open source multimodal models don't.

No it’s not, it’s a multimodal transformer model.