Hacker News new | ask | show | jobs
by dognotdog 1667 days ago
It is kind of disappointing that it doesn't seem to map any of the folds in the garment, the individual fingers, or other details. It also seems to get the depth around the knob by the right elbow quite wrong. All-in-all, no apparent order-of-magnitude (if any?) improvement over single image "AI" algorithms that have been shown before.
1 comments

Im not sure how good our instinctive expectations are. The folds in the shirt, for example, are very prominent in the normal photo because of the shadows. But the difference in depth really isn’t that large.

Say you have the 255 shades of gray in RBG, and you want to spread them evenly over the distances of 1-5m. That would give you a 1-step increase in brightness for every 1.6cm or so, which happens to be pretty close to what I believe these folds‘ magnitude might be. I’m not entirely sure how prominent the difference would be to the naked eye. IIRC, the MPAA considers even 50 to be plenty.

I‘m leaving out lots of details (pun not intended, but appreciated): you’d spread your resolution logarithmically, for example, not linear. And, of course, you could work with more than the resolution of 255. But it’s a different domain and I believe some intuitions are off if you compare it with the x and y dimensions.

I'm not so convinced I'm seeing the limits of resolution, either angular or depth.

Using parallax to calculate depth undoubtedly has principal limitations in far away details, and mapping to an 8-bit depth buffer is another very reductive step in that regard. (regardless, I'd expect even the folds to show up, at least qualitatively, if I'd looked at an 8-bit rendering of a 3D scene's z-buffer; the gradient contour steps are clearly visible, and dense, yet fail to follow the folds, indicating that the underlying depth data simply doesn't track them at all)

Let's take the sleeves then -- clearly a large difference in relative depth, yet they blend into the rest of the garment. My impression is very much that of standard depth reconstruction "AI" that more or less guesses depths of a few image features, and does some DNN magic around where to blend smoothly and where to allow discontinuities, with the usual "weirdness" around features that don't quite fit the training sets.

Possibly all we can get out of this "parallax" method of depth reconstruction isn't a whole lot better than just single image deep learning, which would not surprise me, as it ultimately relies on the same approach for correctly recognizing and mapping image features across the 9 constituent images in the first place, vs. a true lightfield sensor that captures the actual direction of incoming light.

Look at the shirt between the sleeve and whatever the cook is sprinkling. There's an obvious, soft "bump" there that doesn't seem to correspond to anything in the actual geometry - I'm betting it's an interpolation artefact.