Hacker News new | ask | show | jobs
by blt 2420 days ago
IMO this is not a problem. The people building insanely huge models are expanding the set of tasks that can be done by a computer. Who cares how much memory it takes?

Historically, computationally expensive methods eventually become cheap. In the 1980's, researchers had access to Crays to develop physics model, graphics, etc. requiring lots of floating point math and memory. Meanwhile, for the home computers, game programmers had to implement all their math in fixed point. Nowadays, game engines run the same algorithms that were running on the Crays before.

Same with learning. It's great to use tricks to make models fit on phones. Even better: use tricks to make training new models within the budget of a small academic research lab. That doesn't mean we should invalidate all the work that requires a huge cluster.

2 comments

IMO this is not a problem. The people building insanely huge models are expanding the set of tasks that can be done by a computer. Who cares how much memory it takes?

But are they? The example in the article describes an incremental improvement in a benchmark in exchange for a massive increasing in training time.

Deep learning has achieved success on a number of tasks that previously computers had been unable to do. Since the initial period of success, it is an area of debate whether deep learning has expanded it's basic area of applicability or whether is has incrementally on it's initial achievements.

And if it is true that deep learning is stuck on just expanding what it's already doing, it might be the fundamental next advance might come from one person with one machine rather than a massive team with a massive machine. Consider that neural nets as a theory had been around since the 1990s if not the 1960s but the fundamental advantage of DL came when grad students could use GPU in the 2010s, not when massively parallel machines came into existence (quite a bit earlier).

Here, the further wrinkle is that moore's law is gradually ending. We won't access to that much more computing power twenty years hence - so making less do more does make sense.

> And if it is true that deep learning is stuck on just expanding what it's already doing, it might be the fundamental next advance might come from one person with one machine rather than a massive team with a massive machine. Consider that neural nets as a theory had been around since the 1990s if not the 1960s but the fundamental advantage of DL came when grad students could use GPU in the 2010s, not when massively parallel machines came into existence (quite a bit earlier).

One thing that I can't help wondering, however sci-fi it sounds, is if model simplifications like in this post might lead to models humans can fully understand, which then might lead to new styles of traditional programing - opening up whole new ways of doing things.

I disagree. There are lots of advancements that DL has yet to fully realize with even the current technology. You're focused on commercial applications but applying neural network models, especially CV models to many types of scientific research has yet to be explored due to lack of funding.
I'd like to think I put my comments "as potential problems" since I can't claim to follow everything that's done as deep learning.

Still, to continue the devil's advocate position. Deep learning comes up with a lot of things that are suggestive but not tight enough in their approximation to be useful.

I would guess there are huge number of correlations that seems plausible but aren't really causations. You can apply employ a monster stream of sort of intelligent seeming claims and predictions and find they don't yield any progress in any firm scientific domain. The application of deep learning to finding cancer and related diagnosis processes has been "exciting and promising" for a long time but effectively yielded nothing so far because "quite accurate in highly controlled situations" turns out to seldom be that useful, at least not so far.

I find this weird too, question of "miniaturization" should come after theoretical stage is satisfied. Is this coming from a line of thinking where capitalistic sense avoids high costs or strict design sensibility where optimizition is a primary concern? The nuance is tiny but very important.
I agree, but the main reason why "miniaturization" exists is that it can be done in parallel with theoretical developments and allows you to make money off the results (therefore funding more R&D).