Hacker News new | ask | show | jobs
by randcraw 1610 days ago
Based the following constraints that lie at the center of AI and parallelism, I'd say no -- stochastic gradient pursuit using vector processors like GPUs is inescapable in all future AI advances.

1) All AI is based in search (esp. non-convex, where heuristics are insufficient to provide a global convex solution), and thus is inevitably implemented using iteration, driven locally by gradient-pursuit and globally by... ways to efficiently gather information to optimize the loss function that measures how well that info gain is being refined and exploited.

2) Search that is inherently non-convex and inefficient requires as much compute power as possible, i.e. using supercomputers.

3) All supercomputer-based solutions to non-convex problems are implemented iteratively, where results are improved not using closed-form math or complete info, but by incremental optimization of partial results that aggregate with the iterations, like repeated stochastic gradient descent that creates and enhances 'resonant' clusters of 'neurons'.

4) The only form of supercomputing that has proven to scale up at anywhere near indefinitely is data-parallelism (a dataflow-specific form of SIMD) -- where the search space is spread as evenly (and naively) as possible across as many processing elements as possible.

5) Vector processing hardware like GPUs implement data-parallelism as well as any HPC architecture yet devised.

Thus, I believe that AI is stuck with GPUs, or equivalent meshes of vector processors, indefinitely.

2 comments

Completely agree. I'd only add one point: Quantum computing -- when and if we ever build them with enough error correction to be practical -- can replace GPUs for a few very specific kinds of search where the statistical shotgun approach that quantum computers do so well can be exploited. One canonical example is the quantum FFT (Shor's algorithm) which seems to work quite well for finding the factors of large composite numbers.

Quantum computers are not magic and they are by no means general-purpose, but if your search mechanism matches the kind of search quantum computers are good at, they can greatly exceed the speed of GPUs.

Could something like 'lottery ticket' review approach make this process faster? I might be wrong but I am not sure if that is parallelized.
I'm not sure what you mean by 'lottery ticket' review -- presumably some sort of random global search shortcut, like mutation in genetic algorithms.

If so, then no, using randomness to disrupt search only resets the search to continue at a different (random) spot in the global search space. That kind of approach certainly can escape local minima, but it doesn't diminish the amount of total time/effort required to do (non-convex) global search.

I think the long history of our attempting to use randomness to improve search has shown that all naive approaches inevitably fail unless the number of processors available can be scaled up to 'equal' the number of decisions required to explore the entire search space (as quantum computing seeks to do). But maybe the formal resolution of that question will come only after we prove P != NP.

Sorry I meant the 'lottery ticket' hypothesis. https://arxiv.org/abs/1803.03635v1

The lottery tickets are random but they are basically - as I understand - smaller graphs that have the same strength as a larger graph.