The Wikipedia article is incorrect. The caching, or memoization, is a technique heavily utilized in dynamic programming. Page 183 of Algorithms [0] has an explanation of what memoization is and how it is used in dynamic programming. This StackOverflow answer [1] also has a good explanation of the differences between DP and memoization. Any decent algorithms textbook will also have a section dedicated to DP.
> Then why did recursion work so well with divide-and-conquer? The key point is that in
divide-and-conquer, a problem is expressed in terms of subproblems that are substantially
smaller, say half the size. For instance, mergesort sorts an array of size n by recursively
sorting two subarrays of size n/2. Because of this sharp drop in problem size, the full
recursion tree has only logarithmic depth and a polynomial number of nodes.
> In contrast, in a typical dynamic programming formulation, a problem is reduced to
subproblems that are only slightly smaller—for instance, L(j) relies on L(j − 1).
I think I got it. The distinction isn't that clear cut anyway. For example, I could implement a mergesort with caching and say that I'm technically doing dynamic programming, even though the cache would get hit exactly zero times. It would sort of be "trivially" dynamic programming. On the other hand, the first time I calculated fibonacci numbers recursively, I've added a cache. This would be "memoization". Or in other words, wikipedia article is correct, but some problems don't benefit from the caching.
[0] https://people.eecs.berkeley.edu/~vazirani/algorithms/chap6....
[1] http://stackoverflow.com/a/6185005