Hacker News new | ask | show | jobs
by jacobn 762 days ago
> Goes to show just how much is in the training data.

And in the scale (num_layers, embed_dim, num_heads) of the model of course ;)