|
"Quién es más macho?" In a very short time, transformers have gone from under 1B, to 1.5B, to 3B, to 5B, to 175B, and now 600B parameters. 1T is only, what, like 67% more parameters, and therefore likely to be achieved in the short term. In fact, the authors of this paper tried 1T but ran into numerical issues that they will surely address soon. Not long after someone crosses 1T, expect 10T to become the next target. And why not? The best-funded AI research groups are in a friendly competition to build the biggest, baddest, meanest m-f-ing models the world has ever seen. Scores continue to increase with diminishing returns, which is all fine and nice, but more importantly it seems we should expect to see machine-generated text getting much better from a qualitative standpoint -- that is, becoming less and less distinguishable from a lot of human output. That has been the trend so far. We live in interesting times. |
Otherwise, Google already had a 137B parameter model in 2017: https://arxiv.org/abs/1701.06538