Hacker News new | ask | show | jobs
by hovering_nox 1071 days ago
There is a mod for Skyrim where someone piped together multiple AI models. It goes like this: You speak into your microphone and ask a NPC something. This gets transcribed (voice to text) by Whisper AI. This transcript gets send to eg. GPT-4 with a pre-prompt engineered to give background, current information and the "personality" for the NPC you are talking to. The output of this gets piped back to a Text-to-Speech solution like eleven-labs with the original NPC voice.
1 comments

I've seen an example and the weakest link seemed to be the TTS, which sounded several generations behind.