Indeed, Gemini really is incredible at image analysis. Yesterday I pointed it at some sloppy handwritten notes and asked it to add up the numbers in the right column, and it did it no problem. I've also used it to find out what TV show or actor is on screen, and various other things. It's quite impressive.
> Indeed, Gemini really is incredible at image analysis. Yesterday I pointed it at some sloppy handwritten notes and asked it to add up the numbers in the right column, and it did it no problem. I've also used it to find out what TV show or actor is on screen, and various other things. It's quite impressive.
I do not know if it works as well as Gemini, but Salesforce (of all places) has a model that does something similar.
What's "neat" about the Salesforce one is that you can run it locally and just iterate it over as many images as you feel like.
For instance, it should be possible to take a movie, pull a hundred images out of the h265 file, have the salesforce model evaluate what is happening at that moment in the movie, and then use that to create an index.
That's just ONE use for it, and I can think of dozens.
On a 5090 it was able to generate text descriptions of a folder full of approximately 500 images in under a minute. (Anecdotal evidence, admittedly.)
I got a shirt I liked from a conference, and I didn't know who made it. It was soft, fit comfortably... I took a picture of some random numbers on a tag and Gemini parsed out the numbers and found the manufacturer. Pretty neat
I don't know what runs on my phone's Google Translate app, but whatever it is, they are doing an insult to their models by it being so bad. It's amazing at picking up sound if spoken directly into the unit, but if trying to hold any kind of conversation or listen to anything even a little bit far away, it falls completely apart, is good for basically nothing.
This is obviously different than the models most people are discussing here, which are much bigger. But it's damaging the Gemini brand in general, by association, if nothing else.