Hacker News new | ask | show | jobs
by 300bps 1515 days ago
Honestly at this point it kind of is magic.

How much of that magic is smoke and mirrors? For example, the First Tech Challenge (from FIRST Robotics) used Tensor Flow to train a library to detect the difference between a white sphere vs a golden cube using a mobile phone's on-board camera.

The first time I saw it, it did seem pretty magical. Then in testing realized it was basically a glorified color sensor.

I think these things make for great and astonishing demos but don't hold up to their promise. Happy to hear real-world examples that I can look into though.

3 comments

Even if it were practically useless (which it is not, although the practical applications are less impressive than the research achievements at this point), it would be magical. Deep learning has dominated imagenet for a decade now, for example. One reason this is magical is because the sota models are extremely over parametrized. There exist weights that perform perfectly on the training data but give random answers on the test data [0]. But in practice these degenerate weights are not found during sgd. What's going on there? As far as I know there is no satisfying explanation.

[0] https://arxiv.org/abs/1611.03530

If you look at these “degenerate” parameterizations, they’re clearly islands in the sea of weight parameter space. It’s clear that what you’re searching for is not a “minimum” per say but an amorphous fuzzy blobby manifold. Think of it like sculpting a specific 3D shape out of clay. Sure there are exact moves to sculpt the shape, but if you’re just gently forming the clay you can get very close to the final form but still have some rough edges.

As for a formal analysis, I just can’t imagine there existing a formal analysis of ML that can describe the distinctly qualitative aspects of it. It’s like coming up with physics equations to explain art.

I mentored an FTC team that was using the vision system this year, and my overall impression was that the TensorFlow model was absolute garbage and probably performed worse than a simple "identify blobs by color" algorithm would have.

The vision model was tolerably decent at tracking incremental updates to object positioning, but for some reason would take 2+ seconds to notice that a valid object was now in view (which is quite a lot, in the context of a 30s autonomous period), and frequently identified the back walls of the game field as giant cubes.

there's a big difference between a glorified color sensor and a well trained deep learning library (I can say this with authority because I hired an intern at Google to help build one of those detectors). It's still not magic, but a well-trained network is robust and generalizable in a way that a color sensor cannot be.