if you want to be rigorous about it, you can often talk in terms of elements of the set of subgradients rather than gradients, and the convergence proofs for many of the popular optimization algorithms still go through.
Depending on the level you're looking for, I'm always a huge fan of Rockafellar [0] for the more mathematical expositions (they're lovely, clear, and very well thought through).
Even in this case, though, subgradients may not exist (they're only guaranteed to exist for convex functions and any nonconvex function has, almost by definition, at least one point for which there exists no subgradient), in which case one talks about sub-derivatives instead. These always exist whenever a function is semicontinuous (such as the "if" statement example given in the GGP comment), even if it is not convex [1].
All of this is to say, the subject of general variational analysis is mathematically nice, but computationally extraordinarily difficult.
The first proofs of even local convergence of GD to local optima (not stationary points!) with high probability, even in the case of smooth, differentiable functions, only recently emerged as well, and the results are rather weak in the sense of limits [2]. Query-hardness has also been shown for many examples (even with unbounded computational time) in [3] even when the functions are smooth.
The non-smooth case becomes even harder and you have exponential-size query complexity in the number of dimensions even when you restrict that the function cannot "vary too much," locally (i.e., if it is required to be L-Lipschitz) [4].