| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by srinifromsalem 125 days ago

Nice work on the speech-to-speech pipeline! You're absolutely right that it has to go through the text intermediate step - that's actually where a lot of the interesting processing can happen.

I've found that the speech->text->speech approach gives you much more control over the output quality. The text intermediate step lets you clean up transcription errors, adjust tone, and even restructure the content before converting back to speech.

Have you experimented with different text processing steps in between? I've been building something similar at voicevoyage.io focused on that middle text processing layer - turning raw transcriptions into polished content before the final output.

1 comments

graphitout 123 days ago

Yes. Most of the MCP based search lookup now happens over the text stage.

voicevoyage.io looks interesting. Will keep an eye.

link