Hacker News new | ask | show | jobs
by comntr 2536 days ago
That's right, I haven't.

The puzzle will be sent as an SVG, obviously.

Em.. "off-the-shelf OCR" sounds neat, but anyone who knows such words isn't an average spammer. The goal of basic SVG puzzles is to block 99% of the spammers who just type dumb comments on keyboards. The rest 1% can be taken care of by human mods.

TBH, I don't like the reCAPTCHA-like solutions. They are just annoying from my personal experience and if they rely on any 3rd party service, I'll give them a hard pass for this reason alone. My approach is to use trivial SVG-style captchas with adjustable complexity, e.g. instead of asking "23+34", we can ask "log(32)/log(2)" and effectively filter out everyone except people familiar with math, or "md5(2615), first 7 hex digits" and let in only people familiar with cryptography. Forcing users to detect birds and crosswalks will just make them upset, IMHO.

2 comments

It's completely dependent on the traffic of the site if a spammer takes the time to break a custom captcha.

I work on a site with 10 million monthly pageviews and spammers register on a form that has recaptcha and email verification... and we tried hidden input fields and other tricks, but each day we have consistently had 5 new spam accounts. With SVG they can just take a screenshot of what a user sees and send that to OCR. Complex math will turn away as many legitimate users as spammers.

The only real way to stop spam is to use a 3rd party API to detect it, or use something like a karma system that builds up over time. I think we're at the point where simple solutions won't work well unless you have a small site.

That's true when we talk about 10M monthly pageviews, but I doubt that this little extension will reach such popularity levels. If this somehow happens, by that time there will be a way to enable 3rd party captchas for any page.
You said SVG twice now, but I don't see what is obvious about it.

It's an XML document that should be easier to figure out than a raster image format such as jpg or png.

The catch is that the text will be represented as small geometric svg shapes, so the spammer will need to first render the svg to png and then run text recognition tools. But in that svg we can easily add some css animations that make sure the entire image is never rendered, so spammers will need to run the entire browser to take screenshots and will need to assemble the image from multiple frames.
Won’t it have to be converted to a raster image before it can be OCRd?

Granted all you need to do is render it to a canvas but that’s an extra step on top of everything you need for a raster image, I’m not sure it’s easier.

And just rendering to canvas may be very tricky if the captcha is animated with css, i.e. it moves a bit and different parts of it appear at different times.