| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by KhoomeiK 763 days ago
	That's an interesting problem—Tarsier probably isn't the best solution here since it's focused on webpage perception rather than any kind of OCR. But one could try adapting the `format_text` function in tarsier/text_format.py to convert any set of OCR annotations to a whitespace-structured string. Curious to see if that works.