Hacker News new | ask | show | jobs
by yxiongdropbox 3599 days ago
Hi everyone, this is Ying Xiong from Dropbox, and I'm the author of the blog post. Feel free to let me know if you have any question, comments or suggestions.

Hope you enjoy this post, and keep tuned as we have other posts to be published in coming weeks about other part of our scanning feature.

1 comments

Could you elaborate more on the edge detector? I thought it was a bit of a juxtaposition to go from:

> We decided to develop a customized computer vision algorithm that relies on a series of well-studied fundamental components, rather than the “black box” of machine learning algorithms such as DNNs.

To:

> To overcome these shortcomings, we used a modern machine learning-based algorithm. The algorithm is trained on images where humans annotate the most significant edges and object boundaries. Given this labeled dataset, a machine learning model is trained to predict the probability of each pixel in an image belonging to an object boundary.

This seems like a crucial step in the algorithm and sounds exactly like a black box DNN...

The learning algorithm we used is not a neural network that got trained in end-to-end fashion. Instead, it is a local prediction model that takes an input image patch and produces a patch of the same dimension with probability for each pixel of belonging to a document boundary. Those per-patch predictions are then aggregated together to reduce variance, resulting in an edge map of the same dimension as the input image.
What is a patch in your case? Are you running a sliding window over the image or tiling it? Then are you marking each pixel as belonging to the edge of a document or are you marking detected edges as valid document boundaries? Also how do you model the links between the 4 sides? A reference to a paper or follow up blog post would be greatly appreciated.

Great work. Laurent

Ah ok, thanks! Do you have a paper/reference for this (I guess you have a proprietary implementation though)?

As the sibling says, this sounds like a good random forest problem, so you just pass in a load of patches that have been labelled with ground truth and let the classifier give you a probability for each pixel?

I believe the algorithm he's using to be Random Forest, not exactly a black box DNN but close enough :)