| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by czbvyRZNsVcpTm2 1470 days ago

A rookie researcher in this area here.

We are currently working on how to solve DDoS in a better way (I can't share the details now, but it will eventually be published as a paper).

Most previous research has used either a synthetic or an old dataset to verify their proposal. Two famous examples of these are the following datasets: 1. https://www.unb.ca/cic/datasets/ddos-2019.html 2. https://www.caida.org/catalog/datasets/ddos-20070804_dataset...

Aside from those, there are plenty of them available out there. However, I feel that if I am using the synthetic approach, it does not represent real-world traffic. On the old dataset side, it is ancient data. Also, it only contains the attack traffic, as the legitimate traffic has been removed.

On the other side of the world, there exist these "DDoS-for-hire" people, who seem to have plenty of army behind them, which I think the cost to hire them is reasonable. This would justify the "newness" and the "real-world"-ness of the data to be used to verify our new proposal. Let's say I hire them to attack myself and capture the traffic on my side. As long as I have a powerful machine, I would be able to save all the attacks into a "real-world" dataset.

However, this is problematic from an ethical perspective. Someone between me (as a victim) and the adversaries would also be DDoS-ed to some extent. And also, hiring the DDoS-as-a-service is considered risky for my job [0]. After thinking for some time, the big guys are the best entity owning this kind of data, i.e., Cloudflare, Fastly, etc. Does anyone know whether they share such data?

[0] https://portswigger.net/daily-swig/dutch-police-warn-ddos-fo...