You might call it a super-set of recursion-with-caching. It's a common technique, but it's not the whole field - you can also start at the "leaves" and summarize your way up to the "start" (and similar-ish things that don't involve recursive thinking at all). In many ways similar, but not actually the same. No call stack, for starters.
"just caching" is also hard to apply to e.g. the tower of hanoi problem.
The Wikipedia article is incorrect. The caching, or memoization, is a technique heavily utilized in dynamic programming. Page 183 of Algorithms [0] has an explanation of what memoization is and how it is used in dynamic programming. This StackOverflow answer [1] also has a good explanation of the differences between DP and memoization. Any decent algorithms textbook will also have a section dedicated to DP.
> Then why did recursion work so well with divide-and-conquer? The key point is that in
divide-and-conquer, a problem is expressed in terms of subproblems that are substantially
smaller, say half the size. For instance, mergesort sorts an array of size n by recursively
sorting two subarrays of size n/2. Because of this sharp drop in problem size, the full
recursion tree has only logarithmic depth and a polynomial number of nodes.
> In contrast, in a typical dynamic programming formulation, a problem is reduced to
subproblems that are only slightly smaller—for instance, L(j) relies on L(j − 1).
I think I got it. The distinction isn't that clear cut anyway. For example, I could implement a mergesort with caching and say that I'm technically doing dynamic programming, even though the cache would get hit exactly zero times. It would sort of be "trivially" dynamic programming. On the other hand, the first time I calculated fibonacci numbers recursively, I've added a cache. This would be "memoization". Or in other words, wikipedia article is correct, but some problems don't benefit from the caching.
I do recommend you to read this chapter:
https://people.eecs.berkeley.edu/~vazirani/algorithms/chap6....
After reading this chapter, you can try to understand why Dijkstra and Floyd-Warshall shortest path in graph algorithms work.
These classical and fundamental algorithms combine graph theory with dynamic programming.