Hacker News new | ask | show | jobs
by cyphar 2580 days ago
Less than 3% of Tor traffic is to onion services of any kind (which means 97% is to websites already accessible on the public internet), and the most popular onion service on the internet by a large margin is Facebook's (facebookcorewwwi.onion). More than 2 million people use Tor every day -- are they all bad people? Heck, government agents use Tor when traveling abroad.

Do bad people do bad things using Tor? Yes. Do political dissidents in oppressive regimes use Tor? Yes.

However the vast majority of people are just ordinary citizens using Tor to access the internet -- the cross-section of Tor users is the same as the cross-section of ordinary internet users.

1 comments

> the most popular onion service on the internet by a large margin is Facebook's

How do you know? It shouldn't be possible to collect this sort of data.

Counting hits on HSDir(s) and extrapolating a statistic. Related: https://trac.torproject.org/projects/tor/ticket/8106
In 2016, Facebook published an article saying that 1 million people use Facebook (over their onion address) every month[1]. Comparing this with the privacy preserving statistics provided by the Tor project (based on extrapolating HSDir hits) leads you to believe that 1 million per month is the overwhelming majority of .onion site users.

Roger Dingledine mentions this in quite a few of his talks, I'm fairly sure it's an accurate statement.

[1]: https://www.facebook.com/notes/facebook-over-tor/1-million-p...

Exit nodes can track which sites are hit to a degree. CDNs make this more difficult, but it's not too hard to figure out what percentage of your traffic is Facebook. It also won't work if you're going to the Facebook onion site of course.
Exit nodes aren't used like that for .onion sites, so they cannot track usage of .onion sites.

The way it works is that the client and server pick a "rendezvous node" (the server generates 6 HSDir entries, each with 3 random nodes every day, and the client picks a random HSDir entry and a random one of those node to use). Then, they communicate through the rendezvous node which doesn't know who the client or server are (because both are connected through Tor circuits and neither reveals the .onion URL that was looked up in the HSDir).

The way the statistics work is that some Tor relays opt-in to sharing statistics about how many HSDir lookups happened through them, and then those figures are extrapolated to figure out how many .onion service accesses happen. The relay doesn't know which service is being looked up, and the rendezvous node doesn't know which service is being talked to.

(Correction, 3 introduction points and the client picks the rendezvous point -- so even a compromised introduction point is useless because the node used for communication is different for all communications.)