| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fny 41 days ago
	It's possible to rely on mouth movements instead of sound. I've been tweaking visual speech recognition models (VSR) for the past few weeks so that I can "talk" to my agents at the office without pissing everyone off. It works okay. Limiting language to "move this" "clear that" along side context cues vastly simplifies the problem and makes it far more possible on device. I think its brilliant UX.

2 comments

swiftcoder 40 days ago

> I've been tweaking visual speech recognition models (VSR) for the past few weeks so that I can "talk" to my agents at the office without pissing everyone off.

Wouldn't SilentWhisper do just as good a job?

link

makeitdouble 40 days ago

No UX needs to be perfect for everyone, but this doesn't sound trivial to make reliable.

First things that came to mind:

  - facial hair
  - getting people to learn to make bigger mouth movements and not mumble
  - we're constantly self-correcting our speech as we hear our voice. This removes the feedback loop.
  - non english languages (god forbid bilingualism)
  - camera angles and head movement

And that thinking about it for 30s. I'm sure there are some really good use cases, but will any research group/company push through for years and years to make it really good even if the response is luck warm ?

link

encom 40 days ago

>non english languages (god forbid bilingualism)

In my experience, any combination of computers + speech + danish has, so far without exception been terrible. Last time I tested ChatGPT, it couldn't understand me at all. I spoke both in my local dialect and as close to Rigsdansk [π] as I could manage. Unusable performance, and in any case I should be able to talk normally, or there's no point. It was about a year ago - it may have improved but I doubt it. I'm completely done trying to talk to machines.

Pre-emptive kamelåså: https://www.youtube.com/watch?v=s-mOy8VUEBk

[π] https://en.wikipedia.org/wiki/Danish_language#Dialects

link