Hacker News new | ask | show | jobs
by fxtentacle 2354 days ago
Most AI stuff is just horribly over-hyped, so the sad truth might be that what you are seeing is the state of the art and nobody else has found a better way yet.

As a practical example, figuring out where a given pixel moves from one video frame to the next one, when working on real-world videos, the best known algorithms get about 50% of the pixels correct. With clever filtering, you can maybe bump that to 60 or 70%, but in any case you will be left with a 30%+ error rate.

NVIDIA / Google / Microsoft / Amazon will tell you that you need to buy or rent more GPUs or Cloud GPU servers and do more training with more data. And there's plenty of companies in cheap labor countries offering to do your data annotation at a very reasonable rate. But both of them are just trying to sell to you. They don't care if it will solve your problem, as long as you're feeling hopeful enough to buy their stuff.

Judging from the bad results that even Google / Facebook / NVIDIA show at benchmarks, having a near-unlimited budget is still not enough to make ML work nicely.

Oh and for these image classification networks like YOLO, they have their own flavor of problems: https://www.inverse.com/article/56914-a-google-algorithm-was...

1 comments

>As a practical example, figuring out where a given pixel moves from one video frame to the next one, when working on real-world videos, the best known algorithms get about 50% of the pixels correct. With clever filtering, you can maybe bump that to 60 or 70%, but in any case you will be left with a 30%+ error rate.

what do you mean by this? optical flow isn't really a learning problem? it's a classical problem with very good classical algorithms

https://www.mia.uni-saarland.de/Publications/brox-eccv04-of....

https://people.csail.mit.edu/celiu/OpticalFlow/

https://github.com/pathak22/pyflow

It used to be. Then the AI fanboys arrived and started treating it like a learning problem.

https://arxiv.org/abs/1612.01925

https://arxiv.org/abs/1709.02371

https://arxiv.org/abs/1904.09117

BTW, also the classical algorithms deal very badly with noise and repetitive textures, e.g. a video of a forest in the afternoon.

Ever tried "DIS optical flow" in OpenCV? Works like a charm for me even in challenging conditions.
Not yet, thanks for the suggestion :)