| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aspis 4882 days ago

I attended a talk by Quoc Le at UCSD recently, and he made the case that it is necessary to get the algorithms tested large scale, rather than sending too much time on it at small scale.

He had presented a graph comparing some models and their accuracy as the number of features was scaled up to the tens of thousands, his point being that some models that work best at smaller number of features fall off as the number is scaled up. Unfortunately the slides he has on his web page is outdated, so I haven't been able to find that reference. I'd be very happy if one of you know which paper he was referring to. In the old slides he refers to this paper, which makes something of the same point: http://ai.stanford.edu/~ang/papers/nipsdlufl10-AnalysisSingl... It shows how simple unsupervised models with dense feature extraction reach the state of the art performance of more complex models.

Of course, I can see how it makes sense to at least do some small scale prototyping, to work out kinks like you say - but the lesson is that if you are planning to do large scale machine learning you can't necessarily use the small scale tests as a good guide for large scale performance. It's certainly promising if you get very good accuracy, speed or both at small scale, though neither necessarily will carry over to large scale. On the flip side, if your method is worse than state-of-the-art at smaller scales, that doesn't mean it won't beat state-of-the-art at large scales.

1 comments

jmares 4881 days ago

Data shows, as you say, that small scale performance is no indicator of large scale performance.

How then do you decide which projects are worth trying on the large scale?

link