Hacker News new | ask | show | jobs
by joshvm 3604 days ago
Could you elaborate more on the edge detector? I thought it was a bit of a juxtaposition to go from:

> We decided to develop a customized computer vision algorithm that relies on a series of well-studied fundamental components, rather than the “black box” of machine learning algorithms such as DNNs.

To:

> To overcome these shortcomings, we used a modern machine learning-based algorithm. The algorithm is trained on images where humans annotate the most significant edges and object boundaries. Given this labeled dataset, a machine learning model is trained to predict the probability of each pixel in an image belonging to an object boundary.

This seems like a crucial step in the algorithm and sounds exactly like a black box DNN...

2 comments

The learning algorithm we used is not a neural network that got trained in end-to-end fashion. Instead, it is a local prediction model that takes an input image patch and produces a patch of the same dimension with probability for each pixel of belonging to a document boundary. Those per-patch predictions are then aggregated together to reduce variance, resulting in an edge map of the same dimension as the input image.
What is a patch in your case? Are you running a sliding window over the image or tiling it? Then are you marking each pixel as belonging to the edge of a document or are you marking detected edges as valid document boundaries? Also how do you model the links between the 4 sides? A reference to a paper or follow up blog post would be greatly appreciated.

Great work. Laurent

Ah ok, thanks! Do you have a paper/reference for this (I guess you have a proprietary implementation though)?

As the sibling says, this sounds like a good random forest problem, so you just pass in a load of patches that have been labelled with ground truth and let the classifier give you a probability for each pixel?

I believe the algorithm he's using to be Random Forest, not exactly a black box DNN but close enough :)