| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anon291 51 days ago
	Most llms can equally engage with text in picture form as text in token form. In fact my initial research on this (later corroborated by actual published papers) indicate that this is a cheap way to save on tokens.

1 comments

billtarbell 51 days ago

Oh interesting and good to know on the token savings with this technique. My test with claude had it use vision and then programmatically test different variable font input variables (mimicking the user scrub interaction) until it was able to OCR it.

link

anon291 50 days ago

I mean I can't know for sure but I'm pretty sure that by the time the upper layers of the network are reached the lower level networks have already transformed the image tiles into proper position encoded embeddings of the tokens in the words in the image.

That would be my operating assumption at least.

link

billtarbell 50 days ago

The encoded data and the font file go over the network. The font file is executed locally on the readers machine.

link

anon291 49 days ago

My point is you can just render the file and send to AI...

link