| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mingtianzhang 234 days ago
	VLM can already process both the document images and the query to produce an answer directly. Do we still need the intermediate OCR step?