Hacker News new | ask | show | jobs
by lofenfew 572 days ago
It might be worth noting that this, including the harder version the op encountered, are not the hardest captchas that 4chan can serve. There is a still harder version which is sent to less trustworthy IPs. I imagine it would still be tractably solved with computer vision. This in part misses the point though, since 4chan has been continuously altering their captcha since it released, making it difficult to create a permanent solution that won't be broken down the road.
2 comments

Datacenter IPs can’t even post at all, nevermind needing to solve a CAPTCHA. That’s why the accusations of “VPN shill” are usually wrong, as is the assumption of anonymity – 4chan is in fact one of the least anonymous sites on the internet. The optional username feature gives it a veneer of anonymity, but the strict IP requirements ensure almost every post is attributable to a residential internet connection, and reliably associable with other posts from that same connection.
4chan tries to make its users anonymous to each other. There's nothing in there about you being anonymous to their servers.
Some datacenter IPs can post fine, mostly just not those belonging to any large hosting company. I would mention a list of ones I know aren't blocked, but, well, that might get them blocked.
That’s surprising to me. I assumed they were using some service (like Cloudflare) with an updated list of non-residential IP addresses.

I’ve only ever tried to post through Cloudflare WARP (or Apple Private Relay, which is also Cloudflare but different exit IP range). Once I realized that didn’t work, I thought maybe it wasn’t worth posting at all :) I don’t like the idea of my ISP having any suspicion I posted to 4Chan (even if it’s technically https yadda yadda…)

You can get residential ips nowadays. They are much more expensive for an individual, but for a business or nation-state, it is a feasible option.
What about users behind CGNAT, like mobile users?
That’s attributable with the right warrant and correlation with other data available to the ISP.

CGNAT is not an anonymity mechanism – at best it may be a very crude one, but the carriers will make extra effort to remove that anonymity through logging, retention, and segmentation.

Some mobile users can post but I think they've gone so far as to ban entire ISP mobile IP ranges to prevent people from constantly rolling new IPs on their phone.
Nice callback to Moot banning an entire Australian region (Queensland or Victoria, if memory serves) because Aussies did an outsized share of shitposting, and of Aussies those particular ones were the worst.
I'm pretty sure all of t-mobile is rangebanned. Phoneposters are usually told to buy a pass.
That sounds old 2ch.net. Was that plan from Hiroyuki, by chance? IIRC they entrusted the key to kingdom to that guy, or am I mistaken...
Hiro owns 4chan. I remember something about Moot giving him the website for free.
I was aware that he was involved in ops, but didn't know he has full control, thanks...
"Attributable" means by law enforcement, and mobile carriers, like all ISPs, must keep logs. In this case, for who had which IP address when.

(Otherwise, it's akin to the usual confusion between anonymity and pseudonymity.)

That’s true, but to be fair my original comment also said posts would be reliably associable with other posts from the same IP. With CGNAT, that association will be slightly less reliable, but not meaningfully so. The segment of the population who posts on 4chan is so low that there is negligible chance of two 4chan users sharing an exit IP and time window. Even with non-overlapping time windows, the population will be low enough for stylometry (and other factors) to remove any remaining ambiguity.
Yeah, I encountered those as well in my data gathering. I threw them out from the training set, but I kept them for possible future experimentation.
Can you upload a few of these samples somewhere?
I need to manipulate the data a bit, because right now it's just raw, unaligned foreground/background images with solutions. I need to do the alignment and save them as images rather than JSON files. I'll do that when I have the time.