|
|
|
|
|
by tysam_and
1007 days ago
|
|
RNN inference on a smaller edge controller (all history is cached in a single state point for each layer, so much less memory and computation requirements IIRC) :') Very mobile-device and battery-powered systems friendly. :')))) ;'DDDD |
|
Just how much compute/memory are we saving here?
My understanding is that a 1BN transformer is about 2BN flops/inference, so about 1TFLOP for a 500 sequence of inferences (and also about several GB of memory)
What would be the equivalent RWKV (let ignore the inevitable loss penalty which could be significant..)