Hacker News new | ask | show | jobs
by daveguy 223 days ago
These are all multi-modal models, right? And the vision capabilities are particularly touted in Gemini.

https://ai.google.dev/gemini-api/docs/image-understanding