Y
Hacker News
new
|
ask
|
show
|
jobs
by
briansm
302 days ago
I believe youtube still uses 40 mel-scale vectors as feature data, whisper uses 80 (which provides finer spectral detail but is computationally more intensive to process naturally, but modern hardware allows for that)