Using ClickHouse for L7 DDoS and Bot Traffic Analytics with Tempesta FW

Most open-source L7 DDoS mitigation and bot-protection approaches rely on challenges (e.g., CAPTCHA or JavaScript proof-of-work) or static rules based on the User-Agent, Referer, or client geolocation. These techniques are increasingly ineffective, as they are easily bypassed by modern open-source impersonation libraries and paid cloud proxy networks.

We explore a different approach: classifying HTTP client requests in near real time using ClickHouse as the primary analytics backend.

We collect access logs directly from Tempesta FW (https://github.com/tempesta-tech/tempesta), a high-performance open-source hybrid of an HTTP reverse proxy and a firewall. Tempesta FW implements zero-copy per-CPU log shipping into ClickHouse, so the dataset growth rate is limited only by ClickHouse bulk ingestion performance - which is very high.

WebShield (https://github.com/tempesta-tech/webshield/), a small open-source Python daemon:

* periodically executes analytic queries to detect spikes in traffic (requests or bytes per second), response delays, surges in HTTP error codes, and other anomalies;

* upon detecting a spike, classifies the clients and validates the current model;

* if the model is validated, automatically blocks malicious clients by IP, TLS fingerprints, or HTTP fingerprints.

To simplify and accelerate classification — whether automatic or manual — we introduced a new TLS fingerprinting method.

WebShield is a small and simple daemon, yet it is effective against multi-thousand-IP botnets.