Hacker News new | ask | show | jobs
by dotancohen 1322 days ago
This is interesting. I wonder if it could be used in speech therapy, to help the speaker understand the difference between what a sound should sound like, and the sound coming out of their mouths.

I personally have problems saying two different letters differently, and I just recorded myself trying to say both letters. Sure enough, there is very little difference in the way they are both displayed. I will ask other people to say this letters, and see if I can spot the difference. Then I should be able to play around with differing lip, mouth, tongue, and larynx positions to see how close I can come.

Any tips for highlighting subtle differences in specific places would be appreciated!

1 comments

I would love to work on something like that, I have a couple ideas regarding ways to implement it.

However I don't know if I have the time to do it, as there are so many concepts I'm trying to channel through code. Hopefully as I build more computational tools I will increase my bandwidth.

Providing visual tools for phonetic assistance is definitely something I've had in mind for a while and solving the dataset building problem should solve that along the way.

As of now, you can: 1) set the amount_sphere_tube slider to 1 2) decrease the playback_rate to 0.1(in the GUI on the right) 3) on the media controller, go to options(the naming might vary depending on the browser) and set the playback rate to normal or x1. This should give you maximal temporal resolution.

One way features can be further highlighted by generating embeddings from magenta ddsp[0]. Speech sounds are fairly complex though so I don't know how models built on music data would generalize to them. I think tech to do is is there, but for the time being it seems to be scattered around fairly siloed fields. I also tried to use live voice recordings but there are latency issues with the subset of the Web Audio API I'm currently using. However there definitely is value in having a live, spatial feedback of pronunciation.

[0]https://github.com/magenta/magenta-js/tree/master/music#ddsp

Thanks for the tips.

Jot my email address down anyway, and if you ever get around to working on those speech tools I'll be thrilled to help.