Hacker News new | ask | show | jobs
by eikaramba 3625 days ago
co-founder of fileee.com here(we do that for a living :) ). I can tell you that in general adaptivethreshold is better, because it is less error prone against low contrast situations and missing edges. That said there are also cases where canny performs better. That's why we actually decided to use a machine learning approach to decide when to use what. There are even more things one can do to improve the detection(e.g. hough tranform to find edges or use variance or fft to assest whether possible "document" candidates are just garbage rectangles or real documents.)
1 comments

I am interested in learning more about using variance and fft to finding boundary in documents? Can you elaborate or link any good resources to learn more about this, I'm very interested in learning :)