| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ilurk 3867 days ago
	> We implemented a RNN with 2 layers and 128 hidden units in hardware and it has been tested using a character level language model. The implementation is more than 21× faster than the ARM CPU embedded on the Zynq 7020 FPGA. I'm left curious on the performance gain factor when scaling the network in terms of layers and units. Would the performance gap widen as the RNN grows?

1 comments

leonardt 3867 days ago

> Figure 8: The execution time is projected to decrease with the increase of number of LSTM cells running in parallel. This can lead to significant performance improvement.

They do say in the text that "Figure 8 shows the expected speed up, assuming the data throughput is high enough to handle the parallel processing" so take it with a grain of salt. There could and most likely will be (as there always is) other factors that prevent ideal scaling.

link