|
|
|
|
|
by Ldorigo
338 days ago
|
|
The data might be the limiting factor of current transformer architectures, but there's no reason to believe it's a general limiting factor of any language model (e.g. humans brains are "trained" on orders of magnitude less data and still generally perform better than any model available today) |
|