| OK, the definition of scalable is crucial here and it causes lots of trouble (this is also response to several other posts so forgive me if I don't address your points exactly). Let me try once again: an algorithm is scalable if it can process bigger instances by adding more compute power. E.g. I take a small perceptron and train it on pentium 100, and then take a perceptron with 10x parameters on Core I7 and get better output by some monotonic function of increase in instance size (it is typically a sub linear function but it is OK as long as it is not logarithmic). DL does not have that property. It requires modifying the algorithm, modifying the task at hand and so on. And it is not that it requires some tiny tweaking. It requires quite a bit of tweaking. I mean if you need a scientific paper to make a bigger instance of your algorithm this algorithm is not scalable. What many people here are talking about is whether an instance of the algorithm can be created (by a great human effort) in a very specific domain to saturate a given large compute resource. And yes, in that sense deep learning can show some success in very limited domains. Domains where there happens to be a boatload of data, particularly labeled data. But you see there is a subtle difference here, similar in some sense to difference between Amdahl's law and Gustafson's law (though not literal). The way many people (including investors) understand deep learning is that: you build a model A, show it a bunch of pictures and it understands something out of them. Then you buy 10x more GPU's, build model B that is 10x bigger, show it those same pictures and it understands 10x more from them. Look I, and many people here understand this is totally naive. But believe me, I talked to many people with big $ that have exactly that level of understanding. |
However, your last paragraph about how investors view deep learning does not describe anyone in the community of academics, practitioners and investors that I know. People understand that the limiting inputs to improved performance are data, followed closely by PhD labor. Compute power is relevant mainly because it shortens the feedback loop on that PhD labor, making it more efficient.
Folks investing in AI believe the returns are worth it due to the potential to scale deployment, not (primarily) training. They may be wrong, but this is a straw man definition of scalability that doesn't contribute to that thesis.