Hacker News new | ask | show | jobs
by acid__ 844 days ago
Wow, only 256 tokens per frame? I guess a picture isn’t worth a thousand words, just ~192.
2 comments

Back in 2020, Google was saying 16x16=256 words: https://arxiv.org/abs/2010.11929#google :)
gpt4v is also pretty low but not as low. 480x640 frame costs 425 tokens, 780x1080 is 1105 tokens