Hacker News new | ask | show | jobs
by fudged71 951 days ago
Classic HN response.

This is just an early taste of a potentially powerful use case.

I understand the vision API doesn’t have memory, so each screenshot it takes is like an entire new context. If the script/application is able to send WHAT application it’s in, and has some RAG database in the backend to pull knowledge from, this would be incredibly useful.

Of course it’s slow now. If you’re legitimately stuck, a couple seconds for a personalized answer is a perfect trade off. It will get better.

3 comments

I think every UI application should start logging actions the user takes so that AI could learn the mappings from actions to visual output. It would be amazing form of data.
I could say your comment is a classic 2023 HN comment..? There is no reason to be overly optimistic anbout other people’s products. Plus, nobody said “oh wow this will never work”, it’s just currently quite bad.
I couldn’t hear it perfectly, but I’m pretty sure the instructions it provided were to transform the vertices of the cube to make the sphere. It’s like using MS Frontpage. It may look right, but it’s a convoluted mess underneath.