Hacker News new | ask | show | jobs
by sudosysgen 1451 days ago
I will bet 10$ that with reverse DNS + DPI to try to suss out page size and caching behaviour you can identify anyone accessing this website and downloading the 7TB database.
1 comments

No one has to "access this website" because they can read its contents in Internet Archive, Common Crawl, Google Cache, etc. Page size and caching behaviour will not work if the person is using HTTP/1.1 pipelining to request multiple pages from a variety of websites from Internet Archive, over a single TCP connection. (Using CDX API not HTML form at Wayback Machine page.)

The 7TB is via torrent, not via HTTPS. No rDNS needed.