Hacker News new | ask | show | jobs
Show HN: Blitzping – A far faster nping/hping3 SYN-flood alternative with CIDR (github.com)
44 points by VioletVillain 698 days ago
I found hping3 and nmap's nping to be far too slow in terms of sending individual, bare-minimum (40-byte) TCP SYN packets; other than inefficient socket I/O, they were also attempting to do far too much unnecessary processing in what should have otherwise been a tight execution loop. Furthermore, none of them were able to handle CIDR notations (i.e., a range of IP addresses) as their source IP parameter. Being intended for embedded devices (e.g., low-power MIPS/Arm-based routers), Blitzping only depends on standard POSIX headers and C11's libc (whether musl or gnu). To that end, even when supporting CIDR prefixes, Blitzping is significantly faster compared to hping3, nping, and whatever else that was hosted on GitHub.

Here are some of the performance optimizations specifically done on Blitzping:

* Pre-Generation : All the static parts of the packet buffer get generated once, outside of the sendto() tightloop;

* Asynchronous : Configuring raw sockets to be non-blocking by default;

* Multithreading : Polling the same socket in sendto() from multiple threads; and

* Compiler Flags : Compiling with -Ofast, -flto, and -march=native (though these actually had little effect; by this point, the bottleneck is on the Kernel's own sendto() routine).

Shown below are comparisons between the three software across two CPUs (more details at the GitHub repository):

  #      Quad-Core "Rockchip RK3328" CPU @ 1.3 GHz. (ARMv8-A)        #
  +--------------------+--------------+--------------+---------------+
  | ARM (4 x 1.3 GHz)  | nping        | hping3       | Blitzping     |
  +--------------------+ -------------+--------------+---------------+
  | Num. Instances     | 4 (1 thread) | 4 (1 thread) | 1 (4 threads) |
  | Pkts. per Second   | ~65,000      | ~80,000      | ~275,000      |
  | Bandwidth (MiB/s)  | ~2.50        | ~3.00        | ~10.50        |
  +--------------------+--------------+--------------+---------------+

  # Single-Core "Qualcomm Atheros QCA9533" SoC @ 650 MHz. (MIPS32r2) #
  +--------------------+--------------+--------------+---------------+
  | MIPS (1 x 650 MHz) | nping        | hping3       | Blitzping     |
  +----------------------+------------+--------------+---------------+
  | Num. Instances     | 1 (1 thread) | 1 (1 thread) | 1 (1 thread)  |
  | Pkts. per Second   | ~5,000       | ~10,000      | ~25,000       |
  | Bandwidth (MiB/s)  | ~0.20        | ~0.40        | ~1.00         |
  +--------------------+--------------+--------------+---------------+

I tested Blitzping against both hpign3 and nping on two different routers, both running OpenWRT 23.05.03 (Linux Kernel v5.15.150) with the "masquerading" option (i.e., NAT) turned off in firewall; one device was a single-core 32-bit MIPS SoC, and another was a 64-bit quad-core ARMv8 CPU. On the quad-core CPU, because both hping3 and nping were designed without multithreading capabilities (unlike Blitzping), I made the competition "fairer" by launching them as four individual processes, as opposed to Blitzping only using one. Across all runs and on both devices, CPU usage remained at 100%, entirely dedicated to the currently running program. Finally, the connection speed itself was not a bottleneck: both devices were connected to an otherwise-unused 200 Mb/s (23.8419 MiB/s) download/upload line through a WAN ethernet interface.

It is important to note that Blitzping was not doing any less than hping3 and nping; in fact, it was doing more. While hping3 and nping only randomized the source IP and port of each packet to a fixed address, Blitzping randomized not only the source port but also the IP within an CIDR range---a capability that is more computionally intensive and a feature that both hping3 and nping lacked in the first place. Lastly, hping3 and nping were both launched with the "best-case" command-line parameters as to maximize their speed and disable runtime stdio logging.

4 comments

For the posterity, I've managed to make Blitzping much faster: on the ARMv8-A device, it went from ~10.5 MiB/s to ~120 MiB/s. Instead of sendto(), I used a single connect() call to bind the raw socket to its destination, and then I used writev() calls to "queue" many packets for sending. Internally, writev() [and its other counterpart, sendmmsg()] still uses the same for-loop'ing mechanism to iterate through its queue, but said loop would exist in kernelspace, requiring less userspace->kernelspace syscalls to accomplish the same goal.
This looks interesting, I might try attacking my own nodes with it. Out of curiosity have you done anything to make the packets more "real looking" than hping3? I ask because dropping floods from hping3 can be done with a single iptables rule as they do not set options in the header that would make the syn packets appear to be legit. MSS being the most obvious

Have you also tried taking the reverse challenge a.k.a. blue team and defend against your own tool? What methods would you use if someone were using this against servers you wanted to stay up? e.g. CDN? IP stack hardening? eBPF rules?

Thanks, I hope it'll be useful in your experiments. As for looking more realistic, I think it does the job; when I checked with Wireshark checked how different applications (e.g., games, programs) were communicating with their servers, I saw that they were sending empty SYN packets (i.e., no options) in order to initiate the TCP handshake.

Although hping3 does not let you specify TCP options, it still lets you append them as raw data. Despite that, only one application that I tested actually sent TCP (+options) back to its server. However, this specific server still ACK'ed my packets and proceeded normally, even if I did not send these so-called options. In any case, it will be easy to just append the options as raw data at the end of the packet buffer, for targets that actually discriminate based on options.

None of the real-world applications that I tested used fragmentation, and they all tagged their packets with "DF" (do not fragment); due to that, I did not really look into how fragmentation would affect this.

The reason I specifically included CIDR support, something that both nping and hping3 lack in their source IP parameter, was that ISPs or mid-route routers could block "nonsense" traffic (e.g., a source IP outside of the ISP or even country's allocated range) coming from a client; this is called "egress filtering." However, in a DHCP-assigned scenario, you still have thousands of IP addresses in your subnet/range to spoof, which should not get caught by this filtering. Ultimately, because the source IP would be coming from an entire range of addresses, firewalls (and CDNs) will have a harder time detecting a pattern as to block it.

As for defending against TCP SYN flooding in general, you could use "SYN Cookies" (https://en.wikipedia.org/wiki/SYN_cookies) as to expand less resources in keeping connections open. Unfortunately, in a residential ISP scenario where you only have one "real" IP address, your spoofed source IP would be chosen from a range of otherwise-legitimate addresses; those addresses (upon unexpectedly receiving SYN-ACK from your targeted server) would usually respond back to the original server with a RST packet as to terminate the connection, making it so that your "half-open" connections (the goal of SYN flooding) do not get to last as long. Lastly, this tool only really lets you flood packets as quickly as possible; if you have (or rent) multiple datacenter-grade bandwidth, powerful x86_64 CPUs, and lots of "real" IP addresses at your disposal, there would be little that your target could actually do to prevent you from DDoS'ing them. At that point, in the worst-case scenario, they could go "scorched earth" and reject all traffic from entire countries' ranges (saving their CPU/memory usage), but your packets would still continue getting routed there, and you could still saturate their download line.

As for how this could be made faster (without just throwing more cores and computational power at it), the entire overhead is on the Linux Kernel's "sendto()" side; perhaps writing a kernel module to communicate directly with the underlying NIC is the way to go.

> faster

Check out {send,recv}mmsg before something io_uring-ish imho. One syscall/ctxt, many packets.

Thanks, that might be exactly what I am looking for; I'll check it out.
Those numbers seem awfully low; even going via the relatively slow kernel/kernel stack, you can comfortably achieve around 2Mpps (64B) tx, per core (single flow), on ~2010 commodity hardware via standard Linux userspace APIs.

For reference, bypassing the kernel you can saturate a 10G link for ~14Mpps on a single downclocked 500MHz core with same class of hardware.

Stuck on phone at the moment; will check out the code later.

Can you help me understand the intention of this other than just being a DoS tool?
As of now, the code is just a proof of concept for achieving higher throughput than what currently available tools (e.g., nping/hping3) are capable of. Other than that, nping is just too slow, and hping3 has not been updated in 12 years; in any case, both of them lack proper support for "newer" TCP/IP features (e.g., DSCP/ECN instead of IP ToS, or TCP Options in general).

I am currently in the process of re-writing this proof of concept to actually become a full-fledged alternative to those tools. At first, I was planning to fork hping3 as to maintain it, but its code just had too many questionable design choices; there were global variables and unnecessary function calls all over the place.

A 10mbps SYN flood DoS tool?
Those underpowered processors were only used as an optimization benchmark; if it runs good enough there, you could always throw more computational power and cores at it.

EDIT: Also, the numbers were in MiB/s (mebibytes per second), not Megabits per second; 10.5 MiB/s would be ~88 Mbp/s (megabits per second).

Again, I am under the impression that SYN flood is an essentially solved problem in the linux kernel and is defeated by the use of SYN cookies, which leaves the main DoS mechanism to be BW exhaustion... I'm pretty sure there are more effective ways to achieve this...

As for the question wtf is this useful for - debugging issues in your network. E.g. I recently used ping -f to track down an ethernet cable causing around 0.5% of package loss...