Hacker News new | ask | show | jobs
by orange_fritter 953 days ago
This reminds me of discussions about superfluous information in human language sentences. Consider the phrase "that man is bad" versus "that man bad". Somewhat crappy example, but basically yes, an idea can be conveyed in a more efficient representation, but what is lost through compaction is redundancy in a noisy environment.

If all you're doing is parsing "Alexa" out of the air... you're going to have a bad time because realistically, there is a contextual requirement. In AI applications, a proof-of-concept is great, but 99.9% accuracy is basically useless. Think if computer RAM is accurate 99.9% of the time... that's a broken tool.

If it takes 2 seconds to say "Alexa", that's 43,200 2-second chunks in a day, but if the listener is using a sliding window at 60hz, that's 5.2 million opportunities to screw up each day. 99.9% success of parsing a 2-second slice of audio is insufficient.

At some point, no matter how much training you do for ONLY the word "Alexa", you're going to start getting diminishing returns, in which the model to reach desired accuracy will start getting bigger and bigger for less and less improvement. Logical context analysis can easily bridge the gap for much larger gains.