Hacker News new | ask | show | jobs
by tkgally 545 days ago
A few days ago, OpenAI released live video integration with Advanced Voice mode for ChatGPT—point your phone at something and ask what it sees, and it will tell you pretty accurately. I thought it was just a cool trick until I read the top comment on their YouTube video announcement: “I'm screaming. As a visually impaired person, this is what I was eagerly waiting for. Still screaming! Thank you, Sam, Kev and the entire team over at OpenAI.”

https://www.youtube.com/live/NIQDnWlwYyQ

Google released a similar feature with Gemini 2.0 last week. While it doesn’t seem to be integrated with a smartphone app yet (at least on iOS), it can be used through the AI Studio browser interface.

https://news.ycombinator.com/item?id=42394998

1 comments

Is this feature somehow different than what Google has had with lens and what Apple has had with the info button in regular photos for a while now?
It uses the live video feed, and you can talk with the LLM.