Hacker News new | ask | show | jobs
by intalentive 662 days ago
It is 100% possible and there are a slew of tricks you can use to get big performance boosts with negligible cost to accuracy.
1 comments

Do you know what the tricks are?
1. Don’t use LSTMs (4 vector-matrix multiplies) or GRUs (3 multiplies). Use a fixed Hippo matrix to update state. Just 1 multiply and since it’s fixed you can unroll during training, much faster than backprop through time.

2. Write SIMD intrinsics by hand. None of the libraries are as fast.

3. Don’t use sigmoid or tanh functions as your nonlinear activation. Instead approximate them with the softsign function which is much cheaper.

Depends on exact architecture, but these optimizations have yielded 10-30x improvement for single threaded CPU real time audio applications.

When GPU audio matures all this may be unnecessary.