|
|
|
|
|
by ilurk
3867 days ago
|
|
> We implemented a RNN with 2 layers and 128 hidden units in hardware and it has been tested using a character level language model. The implementation is more than 21× faster than the ARM CPU embedded on the Zynq 7020 FPGA. I'm left curious on the performance gain factor when scaling the network in terms of layers and units.
Would the performance gap widen as the RNN grows? |
|
They do say in the text that "Figure 8 shows the expected speed up, assuming the data throughput is high enough to handle the parallel processing" so take it with a grain of salt. There could and most likely will be (as there always is) other factors that prevent ideal scaling.