Hacker News new | ask | show | jobs
by priansh 2474 days ago
wav2letter is pretty fast we haven't been able to break 1.1x on a t2.medium in any of our benchmarks -- what's your setup here?

I definitely think it's a big step in the right direction; it's easily 100x faster than DeepSpeech for us.

If I could have anything I wanted for xmas, I'd ask for a speech to text system that is fast enough to work in browser thru wasm or something.

2 comments

My setup is a work in progress, see the sibling thread. My 2015 2-core MacBook is probably not faster than a t2.medium, so you should be able to hit the same sort of 0.05x ballpark numbers easily with the same sort of setup.

Is there a SIMD.js / WASM equivalent optimized convolution / GEMM? That's pretty much all we'd need to port this to web... well, that and maybe a language model that isn't 1GB. The wav2letter acoustic model I'm using is based on the librispeech conv_glu, which is almost entirely served by conv1d layers.

I've honestly already been considering a demo for my main project (which is mixed english / command decoding) that runs entirely in a web page, if you have engineering time to throw at your christmas wish, we should talk :P

Two years ago someone tweaked Kaldi to build to wasm (https://github.com/adrianbg/kaldi.js). AFAIR it ran at decent speed in a browser (with small models), but it hasn't been maintained since.