Hacker News new | ask | show | jobs
by wpietri 2984 days ago
Update: I decided to try through the Google console, and also try Amazon's speech recognition through the AWS console.

AWS just let me transcribe my MP3 in a pretty straightforward way once I'd uploaded it to an S3 bucket. The transcript is done in 2-3x real time, and the quality seems decent. It comes as a complex JSON file with confidence numbers and timestamps for every word, with alternate words when it knows it isn't sure. It's pretty neat.

Google made me use a sort of query builder interface to construct an API request. The query builder did not actually match the features announced in the blog post, so I just tried going with what was there. When I eventually got a valid-looking request, it blew up because it turns out it can't parse MP3s. So then I reencoded to FLAC and uploaded that. I tried a variety of queries, but none of them worked. The one that got closest complained about a bad value for a field the query builder apparently would not let me add.

I gave up. Squeak, squeak, squeak!

And I should add that the people I know at Google are all perfectly smart, so I don't want anybody to think I'm saying that the individual engineers who made this are dumb or bad. This seems like a giant organizational failure, where what gets built is deeply disconnected from user need and the lived user experience.

Normally when I get insight on a place where this happens, the priority is not actually delivering value, but making managers look good according to easily measured but harmful metrics, like, "Are we at competitive parity at a feature checklist level?" or "Did we launch by some made-up deadline so that a manager could claim success?"

If anybody at Google wants to send me their horror stories, please do email or DM me on Twitter. I'd love to know what the hell happened here, and I promise to keep things as confidential as you like.

1 comments

I keep harping on this about google, but this is so typical of google. It's the same kinda crap with WebRTC, QUIC, VP9/Webm. The one complete WebRTC lib is dug out of Chromium. QUIC, last I checked, is buried in Chromium. VP9/Webm doesn't actually support transparency but Google went and added a custom extension to support it (in Chromium) and so anybody that wants to support alpha with VP9 needs to do it Google's nonstandard way (including adding Google code to FFmpeg to do it).

They just throw stuff that would otherwise be useful to the world out there in the least user-friendly way possible. And then they make a big PR push for a while talking about how great the new thing is and then they forget about it and the project languishes.