> It seems that if we ultimately want to "move at the speed of thought," it will require speech.
Except for the large majority of people who read, type, and click way faster than they can talk. Especially for visual things it’s way faster to drag a rectangle than to describe what you want.
A lot of us also aren’t linear verbal thinkers. It would take minutes to hours to verbalize concepts we can grasp visually/schematically in seconds.
Most people speak at about 150 wpm, but very few can type that fast. But reading and gesturing are fast, which is what TFA is about, combining reading and gesturing with speech.
You rarely need 150wpm when typing. If you try dictation, you’ll notice that half those words are error correction and checksum bits and just turn taking filler.
I usually convey the same meaning with 80wpm typing. Makes it faster to read too
Maybe I’m just slightly adhd – listening to people talk drives my crazy. Get to the point! Much easier if they type it out
Isn't "drag the rectangle" and visual interaction exactly the point of the research in the article? Speech is the perfect side channel to this interaction, not a context switch to text.
Also, I doubt DeepMind is designing for existing programmers and savvy computer users. They are thinking about the other billions of people in the world. Speech is the skill people will already have, not typing.
I suppose the idea is that the AI is going to do the "editing" for you (with all the consequences for "thinking" that implies).
You don't have to think about the design of your app. You just say what you want and the AI makes it appear. If you don't like something, you tell the AI to change it. You iterate live until you get the final result you want.
This is what writing docs has become for me. I have the agent make a draft, then tell it which sections to rewrite, combine, etc. I tell it the ideas I forgot to include. I manually make certain word choice changes. The question is how do you extend this flow to non-pure-text scenarios. For most people, just talking about what you see if probably the easiest.
Except for the large majority of people who read, type, and click way faster than they can talk. Especially for visual things it’s way faster to drag a rectangle than to describe what you want.
A lot of us also aren’t linear verbal thinkers. It would take minutes to hours to verbalize concepts we can grasp visually/schematically in seconds.
Great book on the topic: https://www.goodreads.com/book/show/60149558-visual-thinking