Hacker News new | ask | show | jobs
by atum47 819 days ago
I've always wanted to implement a FFT from scratch and play with it to separate audio waves but then a full time job came along. I guess once you separate vocals from everything else you can just feed it to a speech to text?

To be completely honest, as a human that does not speak English natively, i find some lyrics hard to understand. I've seen native English speakers also having this problem. I think it's only neutral for a NN to do the same mistakes.

1 comments

Source separation is commonly done by applying masks to the spectrogram. Deep learning is used to train the mask masks for different instruments' parameters. As you mentioned, this is the approach we will follow in the subsequent steps.