|
|
|
|
|
by squeaky-clean
998 days ago
|
|
You could probably reduce the "delay" by using a soundboard of pre-generated filler material and playing that while you type the real response. "Let me find that bookmark", "So the thing about that is...", "ummm yeah. so...", "hmmm no not really" You can also use text macros to type the response faster. Here they were trying to get MFA access, so you could map longer phrases that will come up often like "Okta multi factor authentication" to numpad 1. Company name to numpad 2. IT supervisor name to numpad 3. If you know the target of the conversation you can tailor what you pre generate. I like to mess with scam callers when I get one, and I've noticed some are using some kind of soundboard with a woman's voice (I'm pretty positive it is real and not AI) and they have a planned flow / script. If you try to deviate from the script they have some options to bring you back into it. If you ask them to repeat something you can notice it's the exact same audio snippet as before. If you accuse them of being a bot they have a few samples of the woman being shocked and mildly embarrassed. "Oh my goodness, do I really sound like a bot? No it's just been a long work day for me. I'm sorry about that." |
|
Live transcribing in realtime has been a thing for, forever, so there’s no reason for me to think this couldn’t all be glued together into a “voice changer” like the typical super deep “I have your son give me a million dollars” boxes, except instead of doing frequency modulation it is pipes to a model trained on someone’s voice, and applies it. Transcribing to text probably isn’t even needed because why would it be for machine to machine modification. It only needs to go to text for human consumption.
Raw pcm bits from audio in -> AI model trained on victims voice -> line out to phone or voip app.
We totally have the compute to do that. Probably with our phones.