Hacker News new | ask | show | jobs
by atlacatl_sv 843 days ago
I believe h' is for the next state. y(t) is to predict the next word so it uses the current hidden state h(t).