Hacker News new | ask | show | jobs
by theflyinghorse 789 days ago
1.3s imo is a fine time frame to start actually speaking. Humans, well most of us anyway, don’t start speaking informative words right away. Instead we add in “umm”s, inhales, “mhm”s, “yeah…”s and so on. I think your approach is a good one. I’m now wondering for these filler sounds, do you contextualize them somehow? That is make filler feel more natural.
1 comments

Depends on what you're aiming for. For my use case, I'm aiming for the feeling of talking to another human. I built an iOS app for little kids to call Santa. Low latency was important. Now I'm working on a mock interview experience; same deal, needs to feel like the real thing.

Re: contextualizing the filler. No, but it's a good idea :) This thread made me think there's a way to generate one on the fly based on the first part of what the person has said. The challenge though is it seems to me that filler phrases usually relate to what the person said last, not first.