| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jerf 638 days ago

Computer vision has become very good, but whatever it is exactly that it is doing, it is still not the same as human vision, and this is one of the places where that really sticks out. A human can learn from a bunch of super-perfectly-pristine 3D renders to recognize a real-world object. Heck, we can do it from a single pristine 3D render, and not even necessarily a high-quality/resolution one. Whatever it is that computer vision is doing, it is something less "powerful" than human, which we then make up for by throwing a lot more computation and resources at it, which covers over a lot of the problems.

If you can figure out exactly why that is, you will at the very least get a very well-cited paper out of it, if not win some sort of award. It's not a complete mystery what the problem is but the solution is unknown.

Because we don't know what the difference is, we can't fix up the 3D renders. "Just make it noisy" certainly isn't it. (We in fact have a lot of experience throwing noise at things; the whole "stable diffusion" AI image generation is based on that principle at its core.) It has to be the right sort of noise, or distortions, to represent the real world, and nobody can tell you exactly what that is.