| HN Mirror

Depending on the level you're looking for, I'm always a huge fan of Rockafellar [0] for the more mathematical expositions (they're lovely, clear, and very well thought through).

Even in this case, though, subgradients may not exist (they're only guaranteed to exist for convex functions and any nonconvex function has, almost by definition, at least one point for which there exists no subgradient), in which case one talks about sub-derivatives instead. These always exist whenever a function is semicontinuous (such as the "if" statement example given in the GGP comment), even if it is not convex [1].

All of this is to say, the subject of general variational analysis is mathematically nice, but computationally extraordinarily difficult.

The first proofs of even local convergence of GD to local optima (not stationary points!) with high probability, even in the case of smooth, differentiable functions, only recently emerged as well, and the results are rather weak in the sense of limits [2]. Query-hardness has also been shown for many examples (even with unbounded computational time) in [3] even when the functions are smooth.

The non-smooth case becomes even harder and you have exponential-size query complexity in the number of dimensions even when you restrict that the function cannot "vary too much," locally (i.e., if it is required to be L-Lipschitz) [4].

-----

[0] For example, Variational Analysis covers a good amount of this stuff https://sites.math.washington.edu/~rtr/papers/rtr169-VarAnal...

[1] c.f., chapter 8 in [0].

[2] https://arxiv.org/abs/1602.04915

[3] https://arxiv.org/abs/1912.02365 in the stochastic case, and https://arxiv.org/abs/1710.11606 along with https://arxiv.org/abs/1711.00841 in the nonstochastic case

[4] See, e.g., 1.1.3 in Nesterov's Introductory Lectures in Convex Optimization (can be easily found online).