Hacker News new | ask | show | jobs
by tmshapland 249 days ago
Fascinating! How did you decouple the speaker-specific vocal characteristics (timbre, pitch range) from the accent-defining phonetic and prosodic features in the latent space?
1 comments

We didn't explicitly. Because we finetuned this model for accent classification, the later transformer layers appear to ignore non-accent vocal characteristics. I verified this for gender for example.