Show HN: Gemini Cursor – A Multimodal AI Cursor for Your Desktop (Open Source)

Y	Hacker News new \| ask \| show \| jobs

Show HN: Gemini Cursor – A Multimodal AI Cursor for Your Desktop (Open Source) (github.com)

22 points by 13point5 495 days ago

I built Gemini Cursor, an open-source multimodal AI cursor that guides users through tasks on their desktop by pointing and speaking.

It leverages Gemini 2.0 Flash and Google's live multimodal API to analyze what's on screen and provide real-time assistance.

In this demo, my friend tries to add a payment method to Amazon, and the AI cursor walks them through the entire process with visual cues and spoken instructions.

I've also used it to interpret diagrams from research papers—curious to see what other use cases people find this useful for!

Demo: https://x.com/27upon2/status/1889128655672029582

Repo: https://github.com/13point5/gemini-cursor

2 comments

wifipunk 493 days ago

This is a surprisingly useful base project for AI presentation handling where you can be hands free pointing out various details or be typing simultaneously.

Hope you keep working on the project. Different cursor settings for size, shape, and color would be nice.

link

jcmp 494 days ago

Wow! Especially smart with the seperate api call for the position of the element

link