Hacker News new | ask | show | jobs
by joshvm 2272 days ago
It would be nice if there was some more information about where the mask comes from. We want a segmentation map for people, so this technique basically takes the layer activation map for the "person" class - which is id 0 in COCO - and you can threshold that to get foreground/background. If you changed the mask index, this would respond to other object types (so as written this code will only work for humans).

--

By the way, Google does some absolutely nuts stuff with this on the Pixel 3 and 4 - they actually calculate a stereo depth map using the two autofocus sites in individual pixels. Essentially some modern CMOS sensors use a technology called dual pixel autofocus (DPAF), by measuring the response from two photodiodes in the same pixel, you adjust focus until each pixel has the same intensity (more or less). If the camera is out of focus, the two photodiodes will have different intensities.

However what this gives you is two separate images with an extremely small (but detectable) parallax which can be used to give coarse 3D reconstruction, and you can segment foreground and background. It's nice because you get a strong physical prior, rather than having to worry about using a convnet to identify fore/background regions. (They of course apply a convnet anyway to refine the result).

https://ai.googleblog.com/2018/11/learning-to-predict-depth-...

https://ai.googleblog.com/2019/12/improvements-to-portrait-m...

1 comments

It’s a simple Gaussian kernel multiplied with a triangular mask.

You can do better by playing with intensities of pixel values as suggested in the article I linked.

I meant about where this line comes from:

    mask = masks[0][0]  
Presumably 0 is the class ID? For someone new to ML or object detection, it might not be obvious why you take the first channel here.

Also recent related reading: https://bartwronski.com/2020/03/15/using-jax-numpy-and-optim...

HN Discussion: https://news.ycombinator.com/item?id=22590360&ref=hvper.com&...