Hacker News new | ask | show | jobs
by nytf3 3111 days ago
Cool writeup! Although I agree those captcha's are fairly trivial.

In college I wrote a term paper on breaking Microsoft's captcha (which is a little harder but not by much) twice: first with a simple template-based classification method and then a CNN approach.

https://www.dropbox.com/s/jfp5xbv3eh589f6/6_857_CAPTCHA.pdf?...

At the end, we go over approaches that would help captchas fight attacks. I think the quick flickering approach would work best (split the image into uneven parts, flicker them quickly so the human eye can read the aggregate image but any single slice doesn't show the full picture, and the superimposed image is incorrect)

3 comments

Cool idea and thanks for sharing!

One of the challenges here (which I'm sure you are very aware of) is that perception tricks that fool computers like flickering images also can block out users with different types of visual impairments. Sometimes users with even minor or infrequently-symptomatic visual impairments won't be able to read an image[1] that uses a special "trick" like this.

For example, consider the risk of triggering an epileptic seizure with flickering. At a certain point it becomes an accessibility/legal issue.

[1] The animated example from nytf3's paper - please note that in contains strong flickering: http://people.csail.mit.edu/recasens/images/captcha.gif

Would flickering even be necessary? Why not just overlay several transparent GIFs/PNGs? It’s still hackable (so is the flickering solution), but you could also add in a few more tricks to make it more work for the hackers. For example, combine the layers dynamically into a single image with a separate HTTP request to retrieve the (random) positions of each layer within that image. (Just a thought...you could make it as simple or as complex as you want.)
At that point, you could have your captcha-breaker wait for the page to finish rendering, screenshot the relevant portion of the page, and solve from there. Seems easier than trying to download and stitch together the transparent GIFs or decode the jumble of HTTP requests.
That seems more like security by obscurity - as soon as somebody realises you are doing that, they can visit your site with headless Chrome and break it easily.
But whatever process the human eye uses to piece together a flickering image should itself be fairly simple, right?
couldn't you just capture a sample of frames and take the mean ?.