Y
Hacker News
new
|
ask
|
show
|
jobs
by
inciampati
265 days ago
It turns out you can use a fused triton kernel for a true RNN GRU and run just as fast as the minGRU model in training. Yeah, it doesn't work for very long context but neither does minGRU (activation memory...)