Yep. This example basically convinced me that they were unable to figure out anything actually useful to do with the model's new capabilities. Which makes me wonder how capable the new model in fact is.
Yah, pretty sure it is the same feature that's been in Bing Chat for 2 months now. Which feels really like there's only one pass of feature extraction from the image, preventing any detailed analysis beyond a course "what do you see". (Follow-up questions of things it likely didn't parse are highly hallucinated).
This is why they can't extract the seat post information directly from the bike when the user asks. There's no "going back and looking at the image".
Edit: nope, it's a better image analyzer than Bing
This is why they can't extract the seat post information directly from the bike when the user asks. There's no "going back and looking at the image".
Edit: nope, it's a better image analyzer than Bing