Hacker News new | ask | show | jobs
by srinifromsalem 125 days ago
Nice work on the speech-to-speech pipeline! You're absolutely right that it has to go through the text intermediate step - that's actually where a lot of the interesting processing can happen.

I've found that the speech->text->speech approach gives you much more control over the output quality. The text intermediate step lets you clean up transcription errors, adjust tone, and even restructure the content before converting back to speech.

Have you experimented with different text processing steps in between? I've been building something similar at voicevoyage.io focused on that middle text processing layer - turning raw transcriptions into polished content before the final output.

1 comments

Yes. Most of the MCP based search lookup now happens over the text stage.

voicevoyage.io looks interesting. Will keep an eye.