Y
Hacker News
new
|
ask
|
show
|
jobs
by
daveguy
223 days ago
These are all multi-modal models, right? And the vision capabilities are particularly touted in Gemini.
https://ai.google.dev/gemini-api/docs/image-understanding