Hacker News new | ask | show | jobs
by barbecue_sauce 2560 days ago
Anybody have a sense of the performance overhead of using hosts files versus a detached hardware solution like a pihole?
5 comments

My understanding is that difference is in scope, not performance.

Hosts files will only affect the host (workstation/desktop/laptop etc) they're installed on.

Things like piHole try to make it easy to apply the solution to all members of your network - which even in household cases these days can number in dozens, making it impractical to manage hosts files for all of them (This includes items like phones which are typically unfeasible to mess with hosts file).

Pie hole also has a nice browser interface to debug blocked requests that are breaking the site you don’t want To be broken. Which happens inevitably when you pull together 10 different sources of blocked lists.... or just one persons whose ideal blacklist doesn’t match yours.
It would be nice to see a performance hit based on the number of hosts entries.
Not much.

62,448 line (63,370 actual '0.0.0.0' entries) /etc/hosts file, 100x resolving 'www.google.com', Debian GNU/Linux, Thinkpad with spinning rust.

The short version has 32 lines, with 14 active entries, mostly defaults and local systems.

Short hosts:

    $ for i in {1..100}; do time host www.google.com; done 2>&1| grep real |  sed 's/^real[       ]*//; s/0m//; s/s$//' | mean
    n: 100, sum: 2.209, min: 0.015, max: 0.052, mean: 0.022090, median: 0.02, sd: 0.007450
    %-ile:  5: 0.016, 10: 0.016, 15: 0.016, 20: 0.016, 
    25: 0.0165, 30: 0.02, 35: 0.02, 40: 0.02, 45: 0.02, 
    55: 0.02, 60: 0.02, 65: 0.02, 70: 0.021, 75: 0.022, 
    80: 0.0245, 85: 0.029, 90: 0.033, 95: 0.0385
Big hosts:

    $ for i in {1..100}; do time host www.google.com; done 2>&1| grep real |  sed 's/^real[       ]*//; s/0m//; s/s$//' | mean
    n: 100, sum: 2.517, min: 0.016, max: 0.063, mean: 0.025170, median: 0.023, sd: 0.009818
    %-ile:  5: 0.016, 10: 0.016, 15: 0.016, 20: 0.016, 
    25: 0.017, 30: 0.0185, 35: 0.02, 40: 0.021, 45: 0.022, 
    55: 0.024, 60: 0.0255, 65: 0.0265, 70: 0.028, 75: 0.029, 
    80: 0.03, 85: 0.0325, 90: 0.0395, 95: 0.042
The delta of means is .003080s -- call it 3ms slower for the large hosts file.

("mean" is an awk script for computing univariate moments.)

As others have mentioned, the main benefit of a centralised LAN service is that all devices on the LAN are protected. The hosts file on this system (a laptop) is effective regardless of where I am. It also pre-dates my configuring OpenWRT's adblock package about a month ago, though I'd had a hand-rolled DNSMasq configuration earlier. The laptop hosts file is almost certainly a few years out of date -- another occupational hazard of such things.

The OpenWRT solution runs on the Knot Resolver (kresd) caching nameserver. I've not noted any lag for it. The blocklist there is currently 231,627 hosts/domains (roughly doubled: specific + wildcard matches), from 0-29.com to zzzpooeaz-france.com.

I used one of the popular hosts files on my local machine for a while: the networking didn’t seem to suffer, but the boot time for my machine slowed noticeably. And manual updates were painful because loading the file in an editor is slow so if you use your hosts file for other reasons it can inhibit your workflow. I would recommend automated process on some dedicated device so you don’t impact your normal usage.

Another experience I had was that certain sites failed to work correctly. I didn’t do extensive testing but when I disabled the hosts nocking the sites worked, when I enabled it they broke. These were companies with whom I was trying to do account related business: so it wasn’t just that something didn’t render correctly it actively prevented me from updating my accounts when I tried to submit requests.

I still like the approach and will continue to use it, but it hasn’t been frictionless.

When the network goes down, it can take several minutes for it to come back up. I was having DNS and connectivity issues at a LAN party. Wouldn't get connectivity for minutes after a link bounce.

Then I removed the hosts file, and it worked instantly.

Maybe for a static workstation it wouldn't be bad, but for a laptop or something that loses link frequently, it could be an issue.

I don't have any evidence (I've not attempted to benchmark it or anything), but my gut says that the stack is checking the hosts file first anyhow, so it shouldn't be much. It might actually be an improvement over a separate appliance.