| > Investment Strategy: Organizations should invest more in computing infrastructure than in complex algorithmic development. > Competitive Advantage: The winners in AI won’t be those with the cleverest algorithms, but those who can effectively harness the most compute power. > Career Focus: As AI engineers, our value lies not in crafting perfect algorithms but in building systems that can effectively leverage massive computational resources. That is a fundamental shift in mental models of how to build software. I think the author has a fundamental misconception what making best use of computational resources requires. It's algorithms. His recommendation boils down to not do the one thing that would allow us to make the best use of computational resources. His assumptions would only be correct if all the best algorithms were already known, which is clearly not the case at present. Rich Sutton said something similar, but when he said it, he was thinking of old engineering intensive approaches, so it made sense in the context in which he said it and for the audience he directed it at. It was hardly groundbreaking either, the people whom he wrote the article for all thought the same thing already. People like the author of this article don't understand the context and are taking his words as gospel. There is no reason not to think that there won't be different machine learning methods to supplant the current ones, and it's certain they won't be found by people who are convinced that algorithmic development is useless. |
I dare say ChatGPT 3.0 and 4.0 are the only recent examples where pure computing produced a significant edge compared to algorithmic improvements. And that edge lasted a solid year before others caught up. Even among the recent improvements;
1. Gaussian splashing, a hand-crafted method threw the entire field of Nerf models out the water. 2. Deepseek o1 is used for training reasoning without a reasoning dataset. 3. Inception-labs 16x speedup is done using a diffusion model instead of the next token prediction. 4. Deepseek distillation, compressing a larger model into a smaller model.
That sets aside the introduction of the Transformer and diffusion model themselves, which triggered the current wave in the first place.
AI is still a vastly immature field. We have not formally explored it carefully but rather randomly tested things. Good ideas are being dismissed for whatever randomly worked elsewhere. I suspect we are still missing a lot of fundamental understanding, even at the activation function level.
We need clever ideas more than compute. But the stock market seems to have mixed them up.