| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by patientzero 851 days ago
	Unless I've misunderstood, it is most effective on a picture of text and has to answer with text. It is extremely difficult for it to guide you through some GUI or give you a sequence you may want to correct a little without forcing you to study what exactly it is doing instead of cutting and pasting text into a text UI. It's hard for me to imagine if multiple AGI wrapped interfaces could use some other input, I.e. emulated remote desktops and screen shares, (and that could be adequately chainable for AGI output to other interface input,) but I feel like adding all of this data is ultimately making it harder to proof read and adapt something AGI proposes and then automate its repeatable usage (like taking scripts or code.)

1 comments

muzani 851 days ago

It definitely guides me through GUI.

One of my other top use cases for it is getting it to read docs. It will give me step by step instructions to say, deactivate Facebook or do whatever with AWS. Sometimes I get stuck so I send it a screenshot and it'll tell me that the button is actually a tab, or on the left, or I need to scroll down, etc.

Chained data will likely have a hard time. Most of these wrapper startups will probably have a hard time. I tried to make an AI wrapper startup but I couldn't. It's a rare time where the unicorn with huge teams is actually moving faster than the solo devs. It's almost like they were aided by AI or something.

patientzero 851 days ago

So for example when it gave me instructions for evolution mail settings it hallucinated a button, for me to know this I had to correctly follow half the instructions and then read them carefully again.. For a text UI one paste them and the interpreter identifies the first incorrect line.

I think it has always been the case that a GUU is bad for communication of training about it, in a world where everyone is like an autodidactic and gets very little help a GUI could win on other stuff related to the user figuring out what to do or recovering a memory of his to do it.

I'm not really focusing here on the GPT interface itself but if it could wrap all sorts of interfaces then text or GUI ones could be replaced by rarely using them directly. But I think such AI interfaces would put themselves at a disadvantage not working with text as a medium between them.