Hacker News new | ask | show | jobs
by sigill 3565 days ago
The submission links to a blog post on how the data was retrieved: http://255.wf/2016-09-18-mass-analyzing-a-chunk-of-the-inter...

> For this little experiment, I’ve setup a single KVM instance, running a single 2GHz vCore with 2GIB of RAM and 10GiB of HDD space. This is sufficient. Probing for ftp access is an extremely CPU-intensive task. You are going to hit bottlenecks in this order: > > CPU > Memory > a whole lot of nothing > network > > While the rescan was running, only about 1 to 2kpps were exchanged, while the CPU was pinned at 100%.

So this means his setup spent about 1-2 million clock cycles per probe. That's a lot!

I suppose this is because he runs the probe script once per IP address? I suspect that an implementation which would stay in-process would be at least an order of magnitued faster.

1 comments

Sure. Faster even with a better scheduler. I just wanted to show how the simplest and most redneck way still finishes in a reasonable amount of time. :-)
I was amazed how fast that went. Was fully expecting the story to unfold with how you rented out 100 AWS servers to complete the task, instead it was just one computer and only took hours.
It's all about reducing data offline before throwing the kitchen sink at the internet.