Hacker News new | ask | show | jobs
by jpt4 271 days ago
\> statistical learning theory does not adequately model the macro-behavior of very large models.

Might you please elaborate on this? I recognize that "artificial neural networks are lossy de/compression algorithms" does not enumerate the nuances of these structures, but am curious whether anything in particular is both interesting and missing from SLT.

1 comments

SLT typically uses empirical risk minimization, leading to the bias-variance decomposition and a unimodal extremum as the monotonically decreasing bias supposedly balances against the monotonically increasing variance. We now know this does not accurately model overparameterized models, which exhibit double descent, and other phenomena like grokking. To explain them you have to look past classical statistics to statistical mechanics.