Y
Hacker News
new
|
ask
|
show
|
jobs
by
GardenLetter27
9 days ago
It's not just the architecture but also the data - the decoder only approach lets you train in parallel over blocks of text (no RNN serial waiting), that allows you train on much, much more data.