Hacker News new | ask | show | jobs
by jeroenhd 37 days ago
The video they show (which is probably exaggerated by cutting out LLM generation time) is pretty sci-fi. I don't know how it works in practice, but it looks fun to try out. If this could run locally, I'd love to have a feature like that.

Most people don't really seem to care about data collection when it comes to AI usage. A lot of people who will feed Gemini/ChatGPT/Bing/Claude/shady clusters across the internet for bargain bin prices/Mistral every detail of their lives will probably be fine with Gemini as long as it doesn't interfere unnecessarily.

2 comments

It probably works similar to how Gemini works in Android for a while now.

You can point or select anywhere on the screen and it understands and searches the context. If you select a text block, even text inside an image, it allows to copy or search the text online. Otherwise it can search the image.

I use it often. It's intuitive and fast even on non-flagship phones.

I'd wager their A/B tests went well enough to warrant a port from phones to their new "Chromebook".

Their video is completely different from what Gemini does now. It analyses mouse movements, like circling around things, underlining things with the mouse, pointing at things to indicate where they need to go. It's a lot like the interfaces you might see in sci-fi movies, where generic gestures are understood within context in a way that modern computers can't handle.
> circling around things, underlining things with the mouse

Do we use the same Android Gemini assistant?

Because the one I use does that and it has object detection smart enough to be intuitive. It usually gets it right when I point something on the screen. And when it doesn't, I can circle around the thing or just click again.

This Instagram post for example, it automatically highlighted the entire person, but I wanted to know about the shoes. I then clicked once on the shoes and it knew exactly what I wanted and gave me the info in about 2 seconds:

https://imgur.com/a/lHUeciy

This is useful to non tech savvy folks. Not just to us hackers.

Google's Gemini features differ per region to a massive extent. There's a good chance privacy laws prevent Google from providing me with the same Gemini you use.

Object detection is mediocre at best. Circling things and using their AI editing features works, but the artefacts confuse Lens and other image parsing systems. Extracting objects from images usually mostly works, but it's not on par with what Apple had long before Google built it.

The difference remains that the Gemini app on Android requires activation. You cannot tap a button or click a link while you're on the Gemini screen.

The video isn't on the linked page anymore, but it's here: https://deepmind.google/blog/ai-pointer/ and here: https://www.youtube.com/watch?v=pZNzfQLgGsA

It's an absolute privacy nightmare for most people, but if we ever get enough RAM and compute to run this stuff locally, I think this can actually make a new paradigm for user interaction, something with lisp machine self-customisability but for people who don't know anything about computers.

And if it doesn't work, it'll be the most horrific, messy, useless UI humanity has ever invented, and we all get a new funny meme to laugh about Google. Win-win!

> Most people don't really seem to care about data collection when it comes to AI usage.

That assumes you intended to use AI. People are going to accidentally upload random private content to google.

If you buy the Google Gemini AI Agentic Laptop or whatever they will market this as, you're going to want to try AI. What else is the point of buying a Chromebook, as nice and slick as it may look, when similar or even better alternatives exist.