| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cnxhk 237 days ago
	The paper is quite interesting but efficiency on OCR tasks does not mean it could be plugged into a general llm directly without performance loss. If you train a tokenizer only on OCR text you might be able to get better compression already.