There are a handful of papers in the 90s that show this, but it wasn't recognized for what it is. Double descent is REALLY crazy to me, coming from a classical background.
Sure, but that's identification approaches in econometrics and matrix analysis contexts. Using that for neural networks is new-ish in the zeitgeist, which did not exist in the 1990s as it does today.