|
|
|
|
|
by swyx
596 days ago
|
|
> the more I am convinced that Google has trained a two-speaker “podcast discussion” model that directly generates the podcast off the back of an existing multimodal backbone. I have good and bad news for you - they did not! We were the first podcast to interview the audio engineer who led the audio model: https://www.latent.space/p/notebooklm TLDR they did confirm that the transcript and the audio are generated separately, but yes the TTS model is trained far beyond anything we have in OSS or commercially available |
|