Hacker News new | ask | show | jobs
by genericpseudo 3022 days ago
If you're interested in this but have no background, the best place to start is "Fully Convolutional Networks for Semantic Segmentation" – https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn...

This is a very active field of research. Another thread worth pulling on is Mask R-CNN: https://arxiv.org/abs/1703.06870

It's not quite as simple as "this one has highest mAP, let's use it"; the tradeoffs are complex. In particular, as you can see in the image here, one thing DeepLab doesn't do is segment instances – so you get a mask of "people", not a mask per person. Mask R-CNN does a better job on that by design, because it predicts both bounding boxes and a mask per bounding box.

1 comments

Great summary. I believe both models are available in Detectron if anyone wants to give them a go:

https://github.com/facebookresearch/Detectron

Yes for Mask-RCNN. For FCN, there is R-FCN.

Overall I'm really happy to work in a domain where people share their code and models in such an open way. I take issue with detectron in particular though, because a company the size of facebook in the year of 2018 has no excuse to publish a major software package in python 2. The oldest models they implement are from 2015 (excluding VGG16 which is so prolific it's available in literally every library as python 3) and caffe2 is quite a bit more recent than that. Like I said. No excuse...

The team behind Detectron have published an enormous amount of really good research, but the Detectron codebase struck me as "good research code" rather than something you'd ideally want in production.
Of course, I'm not criticising the fact that they publish those models, nor the models themselves. But even publishing arguably polished python2 code in 2018 is something I take issue with if it's not a legacy code base