Hacker News new | ask | show | jobs
by firefoxd 1151 days ago
Simple, straight to the point, and super useful.

One place I used these was on a toy AI assistant. I recorded myself saying a trigger word thousands of times, cut the audio in pieces and converted each to a spectrogram image. I then feed those to a training model to help recognize the trigger word.

Before the spectrogram, i was feeding the wav file directly, it was incredibly intensive on my laptop. But the image files were easier to process in real time. This tool can be used for debugging.

1 comments

How would this work with AI? Don’t you need to train the model to discriminate between the trigger word and other words? If all that’s seen during training is the trigger word, the model will just learn to say “yes” to everything, if you get what I mean.
Yes, i have recorded myself talking on the phone for hours as well. I should have clarified that.