|
|
|
|
|
by patientzero
851 days ago
|
|
Unless I've misunderstood, it is most effective on a picture of text and has to answer with text. It is extremely difficult for it to guide you through some GUI or give you a sequence you may want to correct a little without forcing you to study what exactly it is doing instead of cutting and pasting text into a text UI. It's hard for me to imagine if multiple AGI wrapped interfaces could use some other input, I.e. emulated remote desktops and screen shares, (and that could be adequately chainable for AGI output to other interface input,) but I feel like adding all of this data is ultimately making it harder to proof read and adapt something AGI proposes and then automate its repeatable usage (like taking scripts or code.) |
|
One of my other top use cases for it is getting it to read docs. It will give me step by step instructions to say, deactivate Facebook or do whatever with AWS. Sometimes I get stuck so I send it a screenshot and it'll tell me that the button is actually a tab, or on the left, or I need to scroll down, etc.
Chained data will likely have a hard time. Most of these wrapper startups will probably have a hard time. I tried to make an AI wrapper startup but I couldn't. It's a rare time where the unicorn with huge teams is actually moving faster than the solo devs. It's almost like they were aided by AI or something.