| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mortimerp9 935 days ago

Hello, I work on seamless.

> It runs but any audio input (you will need to provide wav not mp3's) I tried (tried 20s/40s/300s) I get just one short sentence returned in target language that seems not related at all to my audio input (i.e. Tous les humains sont créés égaux).

You might want to open an issue on github for that one. The model is made to work on short utterances, if you have a long speech, you'll want to segment it first. I've tried "tous les humains sont créés égaux" on the demo: https://seamless.metademolab.com/expressive (which runs the same code as in the repo) and the output was correct. Maybe there is something wrong going on in the conversion of the input audio?

> Oh and why is Whisper a dependency? Seems not need if FB has their own model?

Whisper is a dependency as it's used as a baseline for evaluation. You can check out the paper for explanations.

1 comments

mightytravels 929 days ago

I tried as short as 10s and it still provides just something random. How short does the audio need to be? Text works fine but can’t get audio-to-audio to work.

link