Hacker News new | ask | show | jobs
by cbutner 1713 days ago
It is using a full-sized transformer decoder, trained on about 1 million data samples, but with far fewer neural network parameters and training samples than GPT-2 or GPT-3.