Hacker News new | ask | show | jobs
by anon84873628 41 days ago
It seems that if we ultimately want to "move at the speed of thought," it will require speech.
2 comments

> It seems that if we ultimately want to "move at the speed of thought," it will require speech.

Except for the large majority of people who read, type, and click way faster than they can talk. Especially for visual things it’s way faster to drag a rectangle than to describe what you want.

A lot of us also aren’t linear verbal thinkers. It would take minutes to hours to verbalize concepts we can grasp visually/schematically in seconds.

Great book on the topic: https://www.goodreads.com/book/show/60149558-visual-thinking

Most people speak at about 150 wpm, but very few can type that fast. But reading and gesturing are fast, which is what TFA is about, combining reading and gesturing with speech.
You rarely need 150wpm when typing. If you try dictation, you’ll notice that half those words are error correction and checksum bits and just turn taking filler.

I usually convey the same meaning with 80wpm typing. Makes it faster to read too

Maybe I’m just slightly adhd – listening to people talk drives my crazy. Get to the point! Much easier if they type it out

> listening to people talk drives my crazy.

People have so many verbal tics and filler words too. Anthropic’s Dario says “you know” after every third word, for example.

Or they meander around unrelated/unimportant details.

Isn't "drag the rectangle" and visual interaction exactly the point of the research in the article? Speech is the perfect side channel to this interaction, not a context switch to text.

Also, I doubt DeepMind is designing for existing programmers and savvy computer users. They are thinking about the other billions of people in the world. Speech is the skill people will already have, not typing.

There's the adage that writing is thinking, but even more accurately at least for me, editing is thinking.

Neither typing speed nor dictation speed is a true bottleneck, but editing speech seems like it'd be harder than editing text.

Though there may be some hybrid approach that can work well.

I suppose the idea is that the AI is going to do the "editing" for you (with all the consequences for "thinking" that implies).

You don't have to think about the design of your app. You just say what you want and the AI makes it appear. If you don't like something, you tell the AI to change it. You iterate live until you get the final result you want.

This is what writing docs has become for me. I have the agent make a draft, then tell it which sections to rewrite, combine, etc. I tell it the ideas I forgot to include. I manually make certain word choice changes. The question is how do you extend this flow to non-pure-text scenarios. For most people, just talking about what you see if probably the easiest.

> editing is thinking.

I hadn’t realized until just now how accurate that is for me as well. Thank you.