Hacker News new | ask | show | jobs
by kragen 749 days ago
this is exciting! it's still at prototype stage: 'getting about 90% accuracy [distinguishing between the spoken digits 'zero' to 'nine',] with the code as it stands.'

i wonder if modern continuous optimization algorithms could yield a neural network that would do better than this mfcc approach at, perhaps, even lower computational cost

they seem to have gotten more expensive lately, though (11.83¢ in quantity 500), and lcsc is out of stock on the ch32v003. they only have in stock ch32v203 and up, which costs 37.5¢. https://www.lcsc.com/products/Microcontroller-Units-MCUs-MPU...

digi-key, as usual, doesn't list the part at all

3 comments

If you search a part number in Google and the only datasheet result is from an unpronounceable Chinese website, there's a very good chance it's not going to be on digikey. LCSC or AliExpress will be your only options. Even when designing boards, you have to consider whether you want to pick parts from the LCSC library or Digikey because they don't carry all the same parts and even parts that you would think are jellybean don't have the exact same package on both sites (especially SOT packages, similarly sized but not the exact same).
if you think chinese websites are 'unpronounceable' you probably shouldn't try to design hardware
I do design hardware for a living, it all uses quality components from trusted Western vendors and suppliers.
Replacing the codebook approach with a statistical/DNN is more likely to give higher accuracy than getting rid of mfccs as spectral representation (at least in general ASR). (Arguably, using Mel spectra was the least controversial design choice made for Whisper.)
thank you! those are good points. i was thinking that maybe you could get by with some relatively sparse convolutional layers over the raw sound samples and save yourself the expense of doing a real fourier transform, but maybe that's a dumb idea
It is a good idea that is worth trying out! Like anything there are tradeoffs though, so it is not guaranteed to be better for this particular circumstance. The ability to use low bitdepth integer operations (which easy for a neural net) should be beneficial for a CPU without a floating point unit. But weights need to be stored - and it can be difficult to match FFT efficiency - depending on what resolution is actually needed/utilized.
They also don't list any HBM memory of gddr7 which is frustrating as I'm trying to use kicad to design a cheaper PCI card.....but finding any decent documentation on those chips is impossible at the time.
you may need a wechat account to sign up for the necessary chinese-language-only web forums in shenzhen