Hacker News new | ask | show | jobs
Voice Assistant for VSCode (github.com)
125 points by b4rtaz__ 1853 days ago
12 comments

This looks pretty cool, but it’s windows only and doesn’t work outside of VS code.

If you want something more full features that works everywhere, I’ve used https://talonvoice.com/ for a while now.

I just had a quick try of Talon on Windows, using wav2letter (I don't have Dragon) and the recommended scripts[0]. It... doesn't work well for me at all.

I was able to get a couple of simple commands to work in Chrome, sometimes, such as "reload" and "show history". In Visual Studio code, it just spouted a bunch of errors in the console [1], and in JetBrains Rider all it would do it type gobbledygook, like a cat had walked on the keyboard or something. Pretty dissapointing :(

The logs also fill up with "WARNING actions: skipped because they have no matching declaration: (user.select_next_token)".

It was a bit confusing to use too (apart from not really working, I mean!), as it wasn't clear if I had to use some kind of command to enable voice commands, or if it was litening all the time. Eventually I figured out that it seems to be the latter, but still, it's not clear what commands it has heard and understood - I found myself speaking and nothing was happening, and I had no idea what it had understood. Similarly, I'd say something like "close tab", and it would type some nonsense like "aa&" into the current file - again, no idea what command it was actually trying to use.

[0] https://github.com/knausj85/knausj_talon [1] "No such file or directory: 'C:\\Users\\MyUser\\AppData\\Local\\Temp\\vscode-port'"

I recommend asking about hiccups on the Slack [1]. My basic analysis is you're missing a vscode-side plugin, you might need to restart Talon once after dropping in knausj_talon, and that your statement of "it doesn't really work" comes mostly from you guessing command phrases that don't exist. It's a strict command system - you need to learn/know the commands, you can't say just anything and expect it to work.

It's a tool to be learned and practiced, it's not fully optimized for out of the box experience (yet), currently more optimized for customization and total control by people who have the time and motivation to go hands free (e.g. due to limited motor function).

This is what it can look like if you practice a bit: [2]

---

Some recommendations:

- say "say hello world"

- say "help alphabet"

- say "help context"

- say "command history"

- say "dictation mode" then speak freely, then say "command mode"

- Try chaosparrot's Talon Practice [3]

[1] https://talonvoice.com/chat

[2] https://twitter.com/lunixbochs/status/1378159234861264896

[3] https://chaosparrot.github.io/talon_practice/lessons/formatt...

I wasn't attempting to use it as a support forum, more venting after trying it, as I was hoping for something that Just Works(TM) at least for the basic, without me having to spend weeks hacking around with it. I'm quite happy to spend some more time tuning and customising, but only if I get the basics working easily, I'm sure it understands me, and I'm sure it's going to worth the time.

I'm hope you don't think I'm trying to be mean, it was just a really disappointing first experience. Might be my expectations were misaligned.

I had actually already installed the VSCode plugin and restarted both VSCode and Talon (I don't remember if I saw it in a comment in a .talon file, or if I saw it in the console logs, but somewhere it told me to install a plugin). Similarly, I installed the Rider/Idea.

I wasn't quite just saying anything :) Tho it wasn't clear what commands were actually accepted; from looking in the .talon files, I didn't see anything like a string literal, more something code-like. I had to guess at what commands were supported, for example "reload" in Chrome (assuming that's actually a correct command, even that simple command only worked some of the time).

I'm willing to give it another go, but is there a getting starting guide/tutorial for how to get started using it, and how to see what it's actually trying to do when it does something? I used the getting starting guide on the Talon site, but that only tells me how to install it, not actually how to use it.

I edited the parent since you started writing this post, maybe re-read the parent as I preempted some of the questions upthread.

For browser stuff, I'm actually really disappointed with how the "actions implemented in talon files" feature turned out in practice (which is the code-like action(...): syntax you saw), and I'm planning to deprecate it, which should clear that up a bit. Browser commands come from places like generic_browser.talon and tabs.talon. Looks like reload is "reload it"

Besides the tips I gave and the chaosparrot practice, knausj_talon does also have a getting started section in the readme https://github.com/knausj85/knausj_talon#getting-started-wit...

You can get significantly more insight into what's happening by both saying "command history", and opening the repl and running `events.tail()` in it.

There's also a much more accurate speech engine in beta right now, which will be released soon, but I suspect most of the confusion wasn't accuracy related.

Thanks, appreciate the extra information - I plan on giving it another go tomorrow, armed with this new info!

Something else I wanted to ask about - does the voice recognition engine (either wav2letter or the new one you mention) adapt/learn according to the individual using it? I have a fairly strong Scottish accent, and would prefer to speak naturally if possible.

Yeah, my brief experience and impression with these voice assisted coding is they require $300 microphone and a quiet room to get acceptable level of accuracy.

It probably is worth for physically impaired people (but i fear what 6hrs daily of this will do to their vocal cord). I am more interested in BCI technology which is where i see the future.

I'm a non-native speaker and have been succesfully using Talon in an open-office environment with a $20 mic - just to offer a different point of view. I've been using it for coding, not so much dictation so YMMV, but expensive mics, library environments or American news anchor accents are absolutely not required.
Talon doesn't require a $300 mic or a completely silent room. It works fine on my macbook air's builtin mic, and explicitly handles some interesting background noise scenarios (e.g. loud music, sitting next to a running dryer).
Just to be sure, I'd tried it with my laptop's built-in microphone, a good, Plantronics Bluetooth headset, and a good USB headset. Same result.
Here's a good talk from a senior (now staff) engineer at Fastly who uses Talon daily: https://www.youtube.com/watch?v=YKuRkGkf5HU
This is suppose to Free/Libre. Is the source code some where? Can't seem to find it. Also the homebrew seems to be broken.

https://talonvoice.com/update/pgUuEYK3vzmYQtF2PMgOyK/appcast...

Talon is not open source, but it's free to use.
The author of talon doesnt support homebrew installation. Other people try to add it, but it doesn't work to install that way. Have you tried the installer on the website?
Talon is free/gratis - It's not open source.
at least it appears to be high level.

For years people would always comment “I can type faster”, not realizing that we should be able also make it smarter than word by word, or character.

https://youtu.be/hGPNs5C1Lp0

Notice this guy is also using his “hat” as a pointing device

Neat but it seems like your neck would pay a price.

The voice commands are also cool but needing to pause between each one seems like a huge drawback, compared to typing where I can just blaze through.

You don't really need to pause in Talon if you set it up right and practice: https://twitter.com/lunixbochs/status/1378159234861264896
He already paid a price with his hands.

That’s why he was forced to find other solutions

Eye tracking would be cooler but keyboard/mice alternatives are slow to appear

True but I'd take sore hands over a sore neck and back any day.
This is pretty nice :)

My first thought was that our eyes and hands do all the work; our mouth and ears are untapped resources in the quest to become true 100x engineers ;)

All joking aside, I am interested in how well this might work outside of a11y use-cases. Speaking is just so natural. It doesn't have to be used exclusively but I do want to find out if there are cases where it's just nicer to say a command during coding than remembering all kinds of keyboard shortcuts. I always wonder if a more hybrid approach of using touch, speaking and typing for various situations could feel better than keyboard all the way.

Don't forget to use your feet https://news.ycombinator.com/item?id=26430466
Dance Dance Recursion
I've often imagined a voice-command layer for an OS or apps that doesn't steal focus. So, you could be working in a particular app, and be giving instructions by voice to prepare other parts of your workflow.
A voice layer on an iPad could make it a better productivity device.
As long as we're limited to reading western/English we're underutilizing our eyes. Try coding with Emoji & Pinyin using http://github.com/elasticdotventures/_b00t_

Our upper primate brains are actually MUCH better at pattern matching than reading!

Would be nice to use for opening files if that could be configured to work in a suitable fashion.
hybrid approach should really be handy.

Don't think only voice coding for enabled, becoming standard anytime soon.

There was a great talk from 2013 about coding with voice commands: https://www.youtube.com/watch?v=8SkdfdXWYaI

It's by a developer who developed RSI and had to find another way to write code. He uses a combination of Dragon and custom Python scripts to control Emacs.

The fascinating bit for me was the language he created around text navigation and manipulation. Lots of custom short words to optimise the amount of speaking he actually had to do.

Really worth a watch for anyone interested in this. If you want a quick demo, this part of the video is fairly representative: https://youtu.be/8SkdfdXWYaI?t=1034

Looks nice! Here's the source for the server component: https://github.com/b4rtaz/voice-assistant-net-server/blob/ma...
If this this interesting, check out https://serenade.ai/

I’ve only taken it for a test run but it seems really good and smooth.

Incompatible with "open plan" offices, one or the other will win.

Guess which one I'm rooting for. :-)

Is this using IBM Watson for STT? Are you assuming the costs?

Or does it use some windows dictation api?

It's using the Windows api.
This makes it more accessible for many!
How is the multilinguality achieved? By code snippets or is there some language server support?
You put voice-assistant.json with your snippets to the root folder of the project. You can define in this file what you want. Check this example: https://github.com/b4rtaz/voice-assistant/blob/master/media/...
I think the next step is getting a huge catalogue of these actions defined. We speak tons of "codes" already in our life's, learning some more to speak (programming) code will not be too hard I guess.
I'm also wondering this. He says that it works with _every_ language, which leads me to believe that all it does is to paste in code snippets.
Looks very cool well done!
I cannot get it to recognise any commands unfortunately.
that’s 14 years ago. i would have expected voice programming to be solved by now. The original iPhone was introduced in 2007.

Ray Kurzweil’s predictions are taking longer than expected

https://singularityhub.com/2015/01/26/ray-kurzweils-mind-bog...