Hacker News new | ask | show | jobs
by bunderbunder 619 days ago
They didn't remove the hidden state entirely, they just removed it from the input, forget and update gates. I haven't digested the paper either, but I think that in the case of a GRU this means that the hidden state update masking (z_t and r_t in the paper's formulas) only depends on the new input, not the input plus the prior hidden state.