Hacker News new | ask | show | jobs
ReCAPTCHA Dataset (deathlyface.tech)
64 points by deathlyface 2210 days ago
4 comments

Just for fun I started answering all of the Google captchas incorrectly - I’d select every image except the ones matching the challenge and then go to DuckDuckGo when I got tired of doing that. A few things I noticed was that if you select an incorrect image it will probably show up again later to see if you answer it incorrectly again, you can exhaust the number of replacement images that pop up when you answer one incorrectly, and after a week of doing this I stopped seeing the captchas.
Whenever I do that I just get an endless stream of more captchas to fill out.
recaptcha has been broke for years (thanks to google) and I wish that it dies soon...
They are so annoying that I block scripts of captchas in uBlock Origin. So far I've blocked the scripts of recaptcha and hcaptcha [1]. If a website uses a script captcha I leave. But recatcha told me once I was a bot, so I guess it's OK that I don't use websites that don't like bots.

[1] The 3 uBlock Origin filters:

  ||www.google.com/recaptcha/*$script,important
  ||www.recaptcha.net^$script,important
  ||hcaptcha.com^$script,important
It's not even that effective. Spammers just pay people in bangladesh to do them.
The point of captchas is that they cost resources (human time, money, computing power) to solve. Captcha is essentially a proof-of-work scheme, and a very nasty one, because it is designed to torture users instead of relying on computers to do the work. There are computer PoW-based alternatives to captcha, but they are not widely used for some reason (why?).
> There are computer PoW-based alternatives to captcha, but they are not widely used for some reason (why?)

Because when you're renting a cheap botnet for your spam campaign, you don't care that some poor random person's device has to solve a PoW. Ironically you punish everyone except spammers because they certainly aren't using their own hardware.

This is why stuff like hashcash (which had email spam in mind) was dead on arrival.

People (like the commenter above) often assert that spammers can just buy human labor thus recaptcha is useless. But you're already in a whole different ballgame with sites like Twitter if you're attracting targeted human attacks.

If you replace the CAPTCHA with a Hashcash-like PoW system, couldn't the server increase the difficulty when it receives too many connections from an IP?

Even with a huge botnet, spammers can only have so many IPs and computing power.

The difficulty could easily be adjusted to have a computing time ranging from a few milliseconds on a cheap smartphone (default) to a few minutes/hours on a desktop computer (for abusers).

Then it becomes a tax on people with underpowered phones, doesn't it?
It also becomes a tax on people who are pro-privacy (e.g., running uBlock Origin), since they don’t look like “normal” users and therefore have to solve captchas. All. The. Damned. Time.
@Chirael have you tried using the Privacy Pass extension? I hate Captchas as well but the extension makes it somewhat easier.
Does it work for all reCAPTCHAs or just CloudFlare's?
Are these just scraped? Or how have they been obtained?
> it took hundreds of hours for me to collect it

I assume they were hand-scraped. (Actually, you could probably mturk this if you needed a larger dataset.)

Some of them manually, and the rest are automatic (using script)
Thank you very much for the images!

The link in your blog still redirects to Google Drive :)

I fixed it. Thanks for reminding me.
Thanks for the effort, this set looks really useful.
Does this guy own the rights to these images?
Is there a market for these tiny photographs?
No, just an open door to litigation.