|
|
|
|
|
by PranayKumarJain
137 days ago
|
|
This is neat. A couple test cases that have bitten us on real voice agent deployments (beyond noise/accents): - Barge-in / interruption: user starts talking mid-agent-sentence, agent should stop + recover state.
- DTMF flows + mixed-mode ("press 1", then spoken intent). Also: false DTMF (ASR hears "one" as tone).
- Silence / dead air / voicemail: detect long silence, prompt once, then gracefully end; detect voicemail greeting.
- Transfers: warm vs cold transfer, verifying you actually bridged the call + preserving context.
- Telephony weirdness: jitter/packet loss, codec changes (PCMU vs OPUS), partial transcripts, delayed ASR.
- Guardrails: PII capture + confirmation, profanity de-escalation, "agent must not comply" tests. One UX thought: record/replay (store the raw audio + timing) so regressions are deterministic and you can run “golden” call fixtures in CI without placing a real call every time. (We build production voice agents at eboo.ai; happy to share a small bundle of “gotcha” scenarios if useful.) |
|