| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by FeepingCreature 273 days ago
	In principle, it should be possible to identify malign IPs at scale by using a central service and reporting IPs probabilistically. That is, if you report every thousandth page hit with a simple UDP packet, the central tracker gets very low load and still enough data to publish a bloom filter of abusive IPs, say a million bits gives you pretty low false-positive. (If it's only ~10k malign IPs, tbh you can just keep a lru counter and enumerate all of them.) A billion hits per hour across the tracked sites would still only correspond to ~50KB/s inflow on the tracker service. Any individual participating site doesn't necessarily get many hits per source IP, but aggregating across a few dozen should highlight the bad actors. Then the clients just pull the bloom filter once an hour (80KB download) and drop requests that match. Any halfway modern LLM could probably code the backend for this in a day or two and it'd run on a RasPi. Some org just has to take charge and provide the infra and advertisement.

2 comments

01HNNWZ0MV43FF 273 days ago

The hard part is the trust, not the technology. Everyone has to trust that everyone else is not putting bogus data into that database to hurt someone else.

It's mathematically similar to the "Shinigami Eyes" browser plug-in and database, which has been found to have unreliable data

link

FeepingCreature 273 days ago

Personally talk to every individual participating company. Provide an endpoint that hands out a per-client hash that rotates every hour, stick it in the UDP packet, whitelist query IPs. If somebody reports spam, no problem, just clear the hash and rebuild, it's not like historic data is important here. You can even (one more hour of vibecoding) track convergence by checking how many bits of reported IPs match the existing (decaying) hash; this lets you spot outlier reporters. If somebody always reports a ton of IPs that nobody else is, they're probably a bad actor. Hell, put a ten dollar monthly fee on it, that'll already exclude 90% of trolls.

I'm pretty pro AI, but these incompetent assholes ruin it for everybody.

link

pixl97 273 days ago

>malign IPs at scale

As talked about elsewhere in this thread, residential devices being used as proxies behind CGNAT ruins this. Not getting rid of IPv4 years ago is finally coming to bite us in the ass in a big way.

link

codersfocus 273 days ago

IPv6 wouldn't solve this, since IPs would be too cheap to meter.

link