Hacker News new | ask | show | jobs
by cristyansv 875 days ago
Imagine porting this to a dedicated app that can access the context of the open window and the text on the screen, providing an almost real-time assistant for everything you do on screen.
1 comments

Automatically take a screenshot and feed it to https://github.com/vikhyat/moondream or similar? Doable. But while very impressive, the results are a bit of mixed bag (some hallucinations)
I'm sure something like the accessibility API will have a smaller latency.

https://developer.apple.com/library/archive/samplecode/UIEle...

rewind.ai seems to be moving in this direction
this looks equally scary and incredible, especially the "summarize what I worked on today" examples.
it works really well, and locally too!