Hacker News new | ask | show | jobs
by mola 1482 days ago
How is this related? OP was complaining that most of these tons of compute papers don't really show mucjg advance theory wise. They say it's obvious by now that putting more compute would slightly push SOA. The comments there add that these fancy papers are hiding more important work by showing some pretty pictures and pumping the PR machines full power.
2 comments

I see there a rant that others have compute and he doesn't.

There is plenty of papers showing advance theory wise.

Some even show that big compute is necessary like "A Universal Law of Robustness via Isoperimetry":

> Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly. Namely we show that smooth interpolation requires d times more parameters than mere interpolation, where d is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry. In the case of two-layers neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li and Nagaraj. We also give an interpretation of our result as an improved generalization bound for model classes consisting of smooth functions.

https://arxiv.org/abs/2105.12806

I mean, you can be upset at the universe for the way it is, just what is the point?

I'm sort of baffled whether people saying this sort of stuff actually read ML papers. Because this is just overtly not true, this idea that the majority of papers do scaling and nothing else. There are tons of papers exploring creative ideas, even the one mentioned in the critique, and even the subset of papers that are primarily about scale typically involve meaningful scientific discovery.