Hacker News new | ask | show | jobs
by zschuessler 3581 days ago
Fun fact: this won't tell you which bad word. Google's implementation of the speech API returns asterisks for words it deems bad. See:

https://github.com/knpwrs/grumbles.js/blob/master/src/grumbl...

It's also interesting to see which words Google determines is bad, and which they mysteriously don't. The API does real time processing of sentence structure and will return "<three asterisks> on me" and "cum to the park" correctly, based on intent. (Sorry for the offensive speech!)

For a side project I needed to find every single English word/phrase the API would filter. Stumbled upon that in amazement.

(Side note: speaking a long list of bad words into a microphone very slowly was the most fun QA I've done)

5 comments

I get that this is a free service and all, but I find that ridiculous. They are basically crippling the functionality of a service that is global and it is not aimed at a particular application, based on a very localized interpretation of what is nice to say and what is not... this should be controlled at the application level, the API providing at most hints about the tone.

Depending on what you use it on, this could render the service useless. Imagine using it to, I don't know, trying to identify Pulp Fiction sentences against a corpus of scripts. It would fail spectacularly.

Another example context on where this could fail very quickly is when considering people from other languages, e.g., if I'm not wrong, saying "Jesus!" might be impolite in (some contexts of) the US. In Spain, we say "Jesus!" when you sneeze, instead of "Bless you!" (and, in general, we are outrageously foul-mouthed compared to the US).

By the way, I can't edit my comment any longer, but when I said:

> In Spain, we say "Jesus!" when you sneeze, instead of "Bless you!" (and, in general, we are outrageously foul-mouthed compared to the US).

..it may sound as if "¡Jesús!" ("Bless you!") is foulmouthed - when in fact is something a four-year-old would typically say.

I always found this hilarious, my phone won't let me swear in a text I am sending using voice to text but it will gladly boom "fuck" over my car's Bluetooth when someone sends me a text with swear words.
Since the voice recognition isn't perfect it means there is a chance it could make false positives and write a bad word you didn't say. Then people complain to Google or sue them. Similarly google search won't auto suggest offensive words or certain libelous statements (e.g. "X is a criminal", even if they are, and even if its a common search phrase.)
Sue them for what? Hurt feelings?
Yeah, like I said in response to another comment:

> My original idea was going to involve a dependency on one of the bad words lists available on npm, but then I saw the API censors said words and thought, "Oh, that's easy."

I really wish I could turn this "feature" off. Not only is it annoying but it clearly shows people when I am using voice to text and when I am typing.
You can turn it off on Android.
Any reason it's "cum" and not "come to the park"?
I think he's trying to show that the speech API is smart enough to understand the context of the phrase and is not just blindly replacing words based on spelling.
I wonder how it would handle "I've come in my shorts today" vs "I've cum in my shorts today".