Hacker News new | ask | show | jobs
by yxiongdropbox 3606 days ago
The learning algorithm we used is not a neural network that got trained in end-to-end fashion. Instead, it is a local prediction model that takes an input image patch and produces a patch of the same dimension with probability for each pixel of belonging to a document boundary. Those per-patch predictions are then aggregated together to reduce variance, resulting in an edge map of the same dimension as the input image.
2 comments

What is a patch in your case? Are you running a sliding window over the image or tiling it? Then are you marking each pixel as belonging to the edge of a document or are you marking detected edges as valid document boundaries? Also how do you model the links between the 4 sides? A reference to a paper or follow up blog post would be greatly appreciated.

Great work. Laurent

Ah ok, thanks! Do you have a paper/reference for this (I guess you have a proprietary implementation though)?

As the sibling says, this sounds like a good random forest problem, so you just pass in a load of patches that have been labelled with ground truth and let the classifier give you a probability for each pixel?