| "There is nothing, I repeat nothing, more intimidating than an empty text box staring you in the face." Talk about a hyperbolic opening line. Is it really that intimidating to have an empty text box on Whatsapp or your favorite SMS app? No, as you expect to have an appropriate response coming from the other side, pretty much regardless of what your input is. As a frequent user of ChatGPT, I've come to expect the same in there. And it works great, without me having to study any "prompt engineering". In fact, as it gets updated, I get frustrated less and less often — unlike my experience using Bard, which can be better for a few tasks but often returns opaque errors that do feel frustrating. The solution here is clearly for the model to improve, and one doesn't even need a leap of faith — just look at what OpenAI is already delivering! Talking to a competent LLM is nothing like talking to bash or dos. I also get frustrated when I sometimes have to ask for the same thing in a slightly different way... but that's still almost always faster than searching for the right button or submenu in most creation-oriented software. Whoever is waiting for Word or Google Docs to add a "write this in business-formal email tone" dropdown menu to the UI clearly hasn't grokked the true shift we're about to go through in computing. Incidentally, I am often using ChatGPT to help me do more advanced / rarely used tasks in software from Avid Pro Tools to Adobe Premiere. And I can't remember a single time when doing this was slower or more frustrating than reaching out to either Google or the software's own "help" section. Of course we'll have more input options. It makes tons of sense for things like image or video generation. I bet the models will also soon be outputting more and more "interactive elements" that will aid in refining results. But I have a feeling the opening text box (or, better yet, the open ears of a friendly audio assistant) is here to stay. |
It’s interesting you mention this. I’ve been wondering this for a while now - there have been made leaps recently in LLMs, speech synthesis and speech recognition. There are sophisticated language models, computer voices that are hard to distinguish from real humans, and software that can reliably understand even the worst recording of someone speaking. Yet still, those three components have not yet been integrated in a next generation Alexa yet. But why? It doesn’t even sound particularly complicated (on the scale of all the prior art necessary).