Hacker News new | ask | show | jobs
by anon291 15 days ago
I mean I can't know for sure but I'm pretty sure that by the time the upper layers of the network are reached the lower level networks have already transformed the image tiles into proper position encoded embeddings of the tokens in the words in the image.

That would be my operating assumption at least.

1 comments

The encoded data and the font file go over the network. The font file is executed locally on the readers machine.
My point is you can just render the file and send to AI...