| Optimal learning is an interesting problem in computer science because it is fundamentally bound by geometric space complexity rather than computational complexity. You can bend the curve but the approximations degrade rapidly and still have a prohibitively expensive exponential space complexity. We have literature for this; a lot of the algorithmic information theory work in AI was about characterizing these limits. The annoying property of prohibitively exponential (ignoring geometric) space complexity is that it places a severe bound on computational complexity per unit time. The exponentially increasing space implies an increase in latency for each sequentially dependent operation, bounded at the limit by the speed of light. Even if you can afford the insane space requirements, your computation can’t afford the aggregate latency for anything useful even for the most trivial problems. With highly parallel architectures this can be turned into a latency-hiding problem to some extent but this also has limits. This was thoroughly studied by the US defense community decades ago. The tl;dr is that efficient learning scales extremely poorly, more poorly than I think people intuit. All of the super-intelligence hard-takeoff scenarios? Not going to happen, you can’t make the physics work without positing magic that circumvents the reality of latencies when your state space is unfathomably large even with unimaginably efficient computers. I harbor a suspicion that the cost of this scaling problem, and the limitations of wetware, has bounded intelligence in biological systems. We can probably do better in silicon than wetware in some important ways but there is not enough intrinsic parallelism in the computation to adequately hide the latency. Personally, I find these “fundamental limits of computation” things to be extremely fascinating. |