Y
Hacker News
new
|
ask
|
show
|
jobs
by
cbutner
1713 days ago
It is using a full-sized transformer decoder, trained on about 1 million data samples, but with far fewer neural network parameters and training samples than GPT-2 or GPT-3.