Hacker News new | ask | show | jobs
by acid__ 804 days ago
Top multimodal models have about 200-1000 tokens per image, so the math works out.