|
|
|
|
|
by jacquesm
2351 days ago
|
|
This is a good starting point but it ends just when things get interesting. If you are going to process audio for ML make sure you experiment with normalizing the input volume, this can make a huge difference and try if your inputs are in stereo to process both mono, single channel and stereo inputs to see which one performs better. Finally, if you pre-process the audio using an FFT try different FFT sizes. |
|
The trade off for window size is frequency resolution and time resolution. A bigger window gives you narrower bands, so more frequency resolution while giving you less temporal resolution where an onset of transient is significant in the analysis. Similarly, hop size will determine how 'leaky' the process is and how fine grained the windows will be. This can effect detecting quick peaks or changes while possibly smearing them across a few windows.