Less than 3% of Tor traffic is to onion services of any kind (which means 97% is to websites already accessible on the public internet), and the most popular onion service on the internet by a large margin is Facebook's (facebookcorewwwi.onion). More than 2 million people use Tor every day -- are they all bad people? Heck, government agents use Tor when traveling abroad.
Do bad people do bad things using Tor? Yes. Do political dissidents in oppressive regimes use Tor? Yes.
However the vast majority of people are just ordinary citizens using Tor to access the internet -- the cross-section of Tor users is the same as the cross-section of ordinary internet users.
In 2016, Facebook published an article saying that 1 million people use Facebook (over their onion address) every month[1]. Comparing this with the privacy preserving statistics provided by the Tor project (based on extrapolating HSDir hits) leads you to believe that 1 million per month is the overwhelming majority of .onion site users.
Roger Dingledine mentions this in quite a few of his talks, I'm fairly sure it's an accurate statement.
Exit nodes can track which sites are hit to a degree. CDNs make this more difficult, but it's not too hard to figure out what percentage of your traffic is Facebook. It also won't work if you're going to the Facebook onion site of course.
Exit nodes aren't used like that for .onion sites, so they cannot track usage of .onion sites.
The way it works is that the client and server pick a "rendezvous node" (the server generates 6 HSDir entries, each with 3 random nodes every day, and the client picks a random HSDir entry and a random one of those node to use). Then, they communicate through the rendezvous node which doesn't know who the client or server are (because both are connected through Tor circuits and neither reveals the .onion URL that was looked up in the HSDir).
The way the statistics work is that some Tor relays opt-in to sharing statistics about how many HSDir lookups happened through them, and then those figures are extrapolated to figure out how many .onion service accesses happen. The relay doesn't know which service is being looked up, and the rendezvous node doesn't know which service is being talked to.
(Correction, 3 introduction points and the client picks the rendezvous point -- so even a compromised introduction point is useless because the node used for communication is different for all communications.)
I wish people would at least learn the difference between "deep web" and "dark web". ;)
I bet you use the "deep web" multiple times each week. The "dark web" on the other hand, probably not.
I transparently use the darknet continuously every day.
Multiple home servers owned by me and my colleagues make up a VPN we share with friends and family.
Amongst the trusted recursive resolvers we use there's the DoT v3 onion from Cloudflare.
A proxy redirects our traffic for Facebook and DuckDuckGo over the respective onions, same for Debian updates.
A next generation firewall inspects our traffic and use Tor for some websites that are censored or geoblocked.
You have to be a little weary using tor. Anyone can run an exit node and it is trivial to rewrite and inject onto web pages. You can also on the fly intercept SSL requests and generate your own self signed certificate that fails proper verification but looks real enough if inspected that will always trick a percentage of users. If you've used tor with any frequency you've probably hit weird SSL cert errors that go away if you change routes.
To be fair I mostly use it for not overly sensitive stuff. Let me give you an idea: I prefer to not have my ISP log my requests to reddit.com/r/LSD.
Not because I do anything illegal (I don't even take acid), but in this dystopian world where every action on the internet is recorded, the last thing I want is to end up on lists purely because of my curiosity.
If I would do anything I could get into trouble for (which I won't), I would definitely research more about how to use Tor safely.
Please correct me if I'm wrong, but can't your ISP only see that you're requesting reddit.com, as long as you're using https? Now sure, if you go to lsd.reddit.com, it can be logged as a subdomain, but anything beyond reddit.com shouldn't be viewable by your ISP.
I'm not saying that you shouldn't use tor, just that as far as I understand, the whole request, including path and method, is encrypted over tls/ssl after your browser establishes a tcp connection to the server.
With ordinary DNS you are asking in plain text hey, what's the IP address for reddit.com and it does not take a genius to guess that's because you're visiting reddit.com
With HTTPS using TLS 1.2 or earlier the site sends its certificate in plaintext too, so even if you just remember the IP address, it will tell anybody snooping "Hi, this is reddit.com".
In TLS 1.3 the site's certificate is encrypted. However the SNI, which is used to make virtual hosting work, is not encrypted. So your ISP can see where you said you were going, but not whether they proved they were the real deal.
DPRIVE such as DNS over HTTPS cures the first thing, you use an encrypted transport to do DNS queries against somebody trustworthy who won't rat you out.
eSNI (encrypted SNI) is intended to one day cure the other problem.
Even with both these, seeing that you visited a very popular system like Facebook or Reddit is always going to be easy. So Tor remains important.
It is fair to say that using unauthenticated protocols like HTTP over Tor is a pretty bad idea (and there really should be more warning bells about this in the Tor Browser). However on the TLS comment -- almost all modern websites use HSTS, so sslstrip doesn't really work any more.
I mean you can intercept the request, retrieve the real cert, generate a self signed cert with the exact same details, then submit that to the user and be man in the middle. Of course the user gets the blank SSL cert error page on the browser, but a percentage of those users will override and continue. Copying the cert details increases that percentage as some will actually look at the invalid cert. It is quite blatant but it is just a numbers game at that point. If you ever hit an SSL cert error with TOR you should force a new onion path.
Yes, you could do that but then your node would be kicked off the Tor network (because you'd need to do it indiscriminately since you don't know who the user is you're trying to target). In addition, relays are load-balanced based on trustworthiness and bandwidth so in order to attack a significant portion of users you'd need to be running a large and trusted node (which would be hard to do if you're just doing this to attack people).
I wasn't aware that Tor tested services and had a trustworthiness score but an attack like that could still be quite useful for certain purposes and possibly stay well hidden. If you set something up that only did it for Google IP blocks for example it might go undetected. If you actually got shut down you could refine it by only targeting a small percentage of those users. There would be some rate of account collection, however small.
How about sslstrip2 ([1], check demo)? A weakness of HSTS is that is stored per domain and the exit node can also control your DNS traffic. I wonder how hard it is to pull this off as a Tor exit node, for local networks there are tools like bettercap [2].
That is a pretty neat attack, but I disagree it would be useful against Tor.
DNS traffic is funneled through a different Tor circuit than the web traffic. You'd need to apply the bad DNS to all users, which would almost certainly in your exit node being dropped from the network.
I'm also not sure how this would be handled with HSTS preload lists -- HSTS preload applies to all subdomains so you'd need to come up with a completely different domain (and protections against homograph attacks mean that avenue is restricted). It'd probably be simpler to just set up an actual website with LetsEncrypt than to bother with stripping the TLS in this manner.
You are right. With different Tor circuits, the attacker needs to control a lot of exit nodes to correlate the initial HTTP request to ssl-stripped page and the DNS query (to be a global adversary).
Unfortunately, such cases will always be less appealing to write about compared to "assassins for hire on the dark web", leading to this wrong generalization of what Tor is about.
If using the definition for the dark/deep web that I think, then it includes traffic to and from any networked entity that does not have a URL (or otherwise public frontend).
This could then include stored data, VPNs or other company/govt/organisational data that is not accessible via normal web traffic.
Deep web: stuff not indexed by search engines. Private forums, non-public social media accounts, Telegram rooms, Discord servers etc. are technically "deep web".
Dark web: a subset of deep web that requires specific software or configuration to access. Slightly more precise, but still includes every possible use case for IPFS, Dat, ".onion" etc. Note that this is nowhere close to what people usually mean when they use the term "dark web". They're referring to the subset of a subset of deep web that's used for criminal activities.
There are gateways to onion services and IPFS, so those are "indexed by search engines" without any change necessary. Furthermore, any search engine has to be adapted to the medium used, and there are specialized search engines for pretty much anything including Freenet and I2P etc, so saying that the "dark web" is a subset of the "deep web" is incorrect. There is some overlap, but it's not a "part of" relationship.
The problem is that there is one (academic) definition of "deep web", but many incompatible definitions of "dark web", invented by the media basically for whatever they want it to be.
I host all kinds of completely normal websites (ie, amateur radio) as tor hidden services. TOR is great because you actually own your domain instead of just leasing it on the whim of some corporation.
Once you get past the controversy TOR hidden services are more like the 1990s web than what you describe.
When I went to China I expected problems so I setup my laptop with an SSL tunnel on port 443 to a virtual server and then routed openvpn over that. It worked like a charm. My favorite feature of openvpn is it can maintain state, so even if the tunnel resets and openvpn has to reconnect all the tcp connections just pick up where they left off.
This will work for a short while, but consistent long term openvpn-matching packets are now seen the the GFW's automated dpi systems, eventually the IP of your non-china VPN endpoint will get blocked.
Openvpn allows you to connect to and have a routable IP on the network. SSH tunnels are great for some things but being logically on a network is another thing.
Do bad people do bad things using Tor? Yes. Do political dissidents in oppressive regimes use Tor? Yes.
However the vast majority of people are just ordinary citizens using Tor to access the internet -- the cross-section of Tor users is the same as the cross-section of ordinary internet users.