Hacker News new | ask | show | jobs
by GordonS 1851 days ago
I wasn't attempting to use it as a support forum, more venting after trying it, as I was hoping for something that Just Works(TM) at least for the basic, without me having to spend weeks hacking around with it. I'm quite happy to spend some more time tuning and customising, but only if I get the basics working easily, I'm sure it understands me, and I'm sure it's going to worth the time.

I'm hope you don't think I'm trying to be mean, it was just a really disappointing first experience. Might be my expectations were misaligned.

I had actually already installed the VSCode plugin and restarted both VSCode and Talon (I don't remember if I saw it in a comment in a .talon file, or if I saw it in the console logs, but somewhere it told me to install a plugin). Similarly, I installed the Rider/Idea.

I wasn't quite just saying anything :) Tho it wasn't clear what commands were actually accepted; from looking in the .talon files, I didn't see anything like a string literal, more something code-like. I had to guess at what commands were supported, for example "reload" in Chrome (assuming that's actually a correct command, even that simple command only worked some of the time).

I'm willing to give it another go, but is there a getting starting guide/tutorial for how to get started using it, and how to see what it's actually trying to do when it does something? I used the getting starting guide on the Talon site, but that only tells me how to install it, not actually how to use it.

1 comments

I edited the parent since you started writing this post, maybe re-read the parent as I preempted some of the questions upthread.

For browser stuff, I'm actually really disappointed with how the "actions implemented in talon files" feature turned out in practice (which is the code-like action(...): syntax you saw), and I'm planning to deprecate it, which should clear that up a bit. Browser commands come from places like generic_browser.talon and tabs.talon. Looks like reload is "reload it"

Besides the tips I gave and the chaosparrot practice, knausj_talon does also have a getting started section in the readme https://github.com/knausj85/knausj_talon#getting-started-wit...

You can get significantly more insight into what's happening by both saying "command history", and opening the repl and running `events.tail()` in it.

There's also a much more accurate speech engine in beta right now, which will be released soon, but I suspect most of the confusion wasn't accuracy related.

Thanks, appreciate the extra information - I plan on giving it another go tomorrow, armed with this new info!

Something else I wanted to ask about - does the voice recognition engine (either wav2letter or the new one you mention) adapt/learn according to the individual using it? I have a fairly strong Scottish accent, and would prefer to speak naturally if possible.

First - online learning is generally not necessary with my Conformer model (which is generally >5x better at not making errors than the current public model). It's also quite good at accents. My overall goal is to ship speech models so good you won't actually wish for model training.

I view fully automatic online training as a sort of anti-pattern - Dragon does that and it will randomly forget entire words. Talon may eventually have some kind of process for self-serve model training. I do have some plans for what that might look like.

Even without automatic model training there's already a feature to automatically create a sort of "personal dataset" as you use Talon, which you can use to train speech models (Talon or otherwise) down the line, or even send me to improve the main model.

I gave it another try - `events.tail()` was useful to know what Talon was trying to do, but it is very busy. However, I came across https://talon.wiki/getting_started - not sure if this is an official Talon site, but I found it to be a really helpful resource, certainly moreso than https://talonvoice.com/docs. From here I found `command history`, which shows exactly what commands Talon is hearing - great!

I also found that the microphone makes a big difference - with my Plantronics Voyager Legend bluetooth headset, it was basically unusable, misunderstanding almost everything I said. But if I used a cheap Logitech USB headset that I've had for a decade, alphabet accuracy was good.

Something else is that it does seem to struggle a bit with my accent. For example, with the alphabet I would say "air", and 75% of the time it would hear `oh`/`near` - however, if I said "air" in an American accent, it heard `air` correctly every time.

Will be interesting to see how your new engine fairs when it's released.

I did tell you to try "command history" on the same line I told you about events.tail(). You can pass a string argument to events.tail() to filter it (which works even better in the beta/next version). I'm considering modifying the events.tail() behavior to filter / differentiate human-triggered events vs machine triggered events, which should clean up the default output a bit. One of the things I'm strongly considering for the next release is a "subtitle" feature that's enabled by default to tell you what is recognized.

The wiki is something I currently try to introduce folks to later in the process, because it's unofficial and historically has had assumptions, inaccuracies, or very outdated information that caused me additional stress/support load. I know the community has been working on improving that.

Bluetooth mics are almost universally worse than cheap wired mics, due to bandwidth/power/compression constraints. If you make a file user/settings.talon containing "settings(): speech.record_all = 1", Talon will record successful utterances to recordings/ adjacent to user/, and you can compare what the mic sounds like to Talon. It's also very likely the mic works better with Conformer.

The alphabet is pretty easy to change. Check out the top of keys.py. There are some words that aren't really the engine's fault when it comes to accent, and some pairs like air/near are more of a configuration issue if your accent doesn't differentiate them.

I'm hoping to release v0.2 with Conformer sometime around July 1