I think the reason people do gradient descent is that the datasets are too large to solve for all inputs simultaneously. It isn't impossible in theory, really.
Do you mean to say that it is possible to design your parameters over all inputs without gradient descent? I'm somewhat confused, as I think that that would not be possible in the general case (e.g. nonlinear problems are hard to crack without resorting to an iterative procedure like gradient descent). I can see that gradient descent might still make sense for problems that do have clean analytic solutions (if that's what you meant), as those solutions often turn out to be junk at scale. Linear regression is a good example, as it has a nice closed form expression if the solution exists. But the complexity scales poorly as the naive implementation requires a matrix inversion, so a different method might be employed for a large problem - gradient descent could be a candidate.
I think gradient descent is attractive because it's a memoryless process at the batch level - you can process training data in batches instead of processing the entire dataset in one go, without any explicit tracking of the previous batch history. This is a great feature when the scale of your dataset is mind-boggling. I think this is what you were suggesting?
Strictly speaking if you split the parameter set on batches and iterate over batches optimizing each set of parameters with a gradient, it is not strictly a gradient decent, it is more a combination of coordinate decent (because you select the subset of coordinates to optimize first) and a gradient decent.
Ah yes - that sounds like the stochastic gradient descent I've been hearing about. That makes a lot of sense for very expensive models. Thanks for the response nshm - I've recently taken an interest in ML (coming in with some familiarity with optimization), and it's much appreciated to have some 'REPL' in the learning process.
I think gradient descent is attractive because it's a memoryless process at the batch level - you can process training data in batches instead of processing the entire dataset in one go, without any explicit tracking of the previous batch history. This is a great feature when the scale of your dataset is mind-boggling. I think this is what you were suggesting?