Hacker News new | ask | show | jobs
by vegesm 1534 days ago
Actually, MobileNetV3 is a supporting example of the bitter lesson and not the other way round. The point of Sutton's essay is that it isn't worth adding inductive biases (specific loss functions, handcrafted features, special architectures) to our algorithm. Having lots of data, just put that into a generic architecture and it eventually outperforms manually tuned ones.

MobileNetV3 uses architecture search, which is a prime example of the above: even the architecture hyperparameters are derived from data. The handcrafted optimizations just concern speed and do not include any inductive biases.

1 comments

"The handcrafted optimizations just concern speed"

That is the goal here. Efficient execution on mobile hardware. Mobilenet v1 and v2 did similar parameter sweeps, but perform much worse. The main novel thing about v3 is precisely the handcrafted changes. I'd treat that as an indication that those handcrafted changes in v3 far exceed what could be achieved with lots of compute in v1 and v2.

Also, I don't think any amount of compute can come up with new efficient non-linearity formulas like hswish in v3.