Y
Hacker News
new
|
ask
|
show
|
jobs
by
bitL
2420 days ago
RNNs (LSTM/GRU) tend to have issues with scaling. Attention-based models like Transformer on the other hand scale extremely well.