| The author uses these ([1][2]) diagrams to argue that more compute has diminishing returns. But the 'diminishing returns' are on the accuracy of correctly picking the single right category for a photo out of one thousand. Photos may simply not carry enough information to be able to meaningfully distinguish between them at that level of accuracy; existing models already exceeded humans' ability at top-5 accuracy in 2015 [3]. It wouldn't be surprising if SOTA models exceeded humans at top-1 already. It's possible that the human baselines were bored and so performed sub-optimally when picking between the 1K classes. But the argument has now become a subtler one, much less clear cut. As an example of categories that may be difficult to distinguish between, do you feel confident that you can reliably distinguish between the Norwich terrier [4] and the Norfolk terrier [5]? These are two separate categories in ImageNet1k. [1] https://i0.wp.com/blog.piekniewski.info/wp-content/uploads/2... The first diagram shows exponential growth in the compute usage of state of the art deep learning architectures. [2] https://i1.wp.com/blog.piekniewski.info/wp-content/uploads/2... The second diagram shows diminishing returns on Imagenet1k top-1 accuracy from doubling the size of Resnext. [3] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725... [4] https://www.google.com/search?tbm=isch&as_q=norwich+terrier&... [5] https://www.google.com/search?tbm=isch&as_q=norfolk+terrier&... |
1. Google "difference between norfolk and norwich terrier".
2. Click first link: https://www.terrificpets.com/articles/10290165.asp.
3. "The Norwich terrier has prick ears, or ears that stand up, seemingly at alert, while the Norfolk has drop ears, or ears that seem to be folded over".
SOTA models are merely doing black-box pattern matching on who-knows-what, and are highly likely to fail dramatically outside of the training dataset confines.