Hacker News new | ask | show | jobs
by yolorn123 3141 days ago
The reason is there only two gates for Gru, they don't have an internal state as that of LSTM, since having few parameters compared to LSTM it takes less time to train