Show HN: A DNS server that removes the top million domains

Y	Hacker News new \| ask \| show \| jobs

	Show HN: A DNS server that removes the top million domains (millionshort.com)
	48 points by taxonomyman 4925 days ago

12 comments

samwillis 4925 days ago

This is by the same guys as the million short search engine (Google minus the top million). Probably good to use this in combination with the dns to find things that are not just broken links.

http://www.millionshort.com/

The search engine was discussed on HN before: http://news.ycombinator.com/item?id=3910304

link

znowi 4925 days ago

Yes, it's not very useful without the search engine. Which I have tried just now and the experience was... frustrating.

Why? Cause it did not return any results for any of my queries (e.g. "hello"). I thought, no, it can't be broken - must be something on my end. I opened it in Chrome incognito window and it worked! Aha, "location based", I thought. And I was right - they use IP location by default to localize results.

I know it is a common practice now, sadly, popularized by Google, et al - but it sucks! I deal with this each time I travel. Can you, please, prioritize this and first look into my actual request header, which explicitly says that I prefer response in English? Thank you very much.

link

furyofantares 4925 days ago

Why would an incognito window prevent them from customizing results based on your IP?

link

dumbfounder 4925 days ago

If millionshort.com made it into the top 1 million domains would it cease to exist?

link

robotmlg 4925 days ago

Does the set of all sets that don't contain themselves contain itself?

link

bmmayer1 4925 days ago

Why is this useful in any way, shape or form?

link

ronnier 4925 days ago

To get on HackerNews.

link

bmmayer1 4925 days ago

Which is a top 10k site...

link

gojomo 4925 days ago

A great refinement would be: on the error page, suggest alternate sites with similar content that are still reachable.

Or even: for the exact URL visited, suggest the one page in the remaining long tail that's most like (by some text/semantic measure) the originally-requested page. (Or even: redirect automatically to that page.)

link

s353 4925 days ago

If you use these servers you may see a lot less advertising. That's because more than a few of the top 100/1000/10000/1000000 sites are actually just ad servers, assuming Million Short is using Alexa as the source. And because they appear in the top Alexa list one might guess those particular ad servers serve a significant share of the internet's advertising.

Another thought is you could potentially use these as general purpose DNS servers; e.g. they are all Amazon EC2 I believe so with respect to the DNS-based geolocation efforts of many websites, you'd be treated as if coming from the location of whatever region the datacenter is in. Just add the top 100/1000/10000/1000000 sites to your HOSTS file.

link

andrewcooke 4925 days ago

so with respect to the DNS-based geolocation efforts of many websites, you'd be treated as if coming from the location of whatever region the datacenter is in

wut? how does dns based geolocation work? you seem to be saying that sites assume you share the physical location of your dns server?

link

s353 4925 days ago

No. What sites assume is that you are located (at least in a regional sense) near the (recursive) DNS servers you use. For example, that's how many CDN's work.

Note: It's certainly possible to share the exact same location (or interface, to be more precise) as your DNS server. I run my own personal DNS cache on localhost. It's not unheard of. I'd guess there would be a few other readers of HN who do this as well.

link

gojomo 4925 days ago

I think that's wrong: there's no way for a site to know what DNS servers I use. Instead, they use a reverse lookup from the apparent IP I'm connecting from... that is available to them, and is unrelated to my DNS servers.

Or can you supply a reference/explanation for how'd they'd know my DNS servers?

link

dsl 4925 days ago

Run 'dig +short whoami.ultradns.net' in your terminal. You'll get back the IP of the DNS server you are using.

Your ISPs recursive DNS servers send off a query to the sites authoritative servers, which in turn look at the source IP address. That's how they know. (Source: I've built a few CDNs)

link

gojomo 4925 days ago

Sure, but that only applies to the CDNs who have been careful to send diferent answers to different places, for sites relying heavily on such CDNs.

A standalone (single-IP) site not using a CDN, or even a site that uses a CDN solely for bulky static assets, has no direct way to query what DNS servers a client used, other that the fact that those servers resolved the request Host to the listening IP. (Perhaps it could probe by attempting a number of resource loads from hostnames that resolve differently based on different major DNS sources, but that's be obtrusive and require constant maintenance.)

Especially in the 'long tail' (of not-top-1-million-sites), I'd expect the non-CDN or CDN-only-for-big-assets setup to predominate, and so any geographic adaptation would be more likely based on IP lookups (via a database like from MaxMind), rather than CDN inference.

Or is there some other way even static-asset CDNs somehow communicate back their geography-sensing back to primary sites?

link

andrewcooke 4925 days ago

what he (i assume?) is saying is that when a cdn wants to supply data to you, they want to do so from a server as close as possible.

now, typically, dns is configured so that your dns requests go to servers that are "near" you on the network.

so, say you're looking for google.com. the dns server near you will be configured to say that google.com is a server near you (and near the dns server). effectively they are inferring location from dns lookups (and then providing you with a nearby source).

this is completely different to looking up the requesting ip in a database which is what i originally assumed was being discussed (hence my confusion and perhaps yours). but it (this process for choosing cdn providers) does seem to be called geolocation by cdn people (just google "cdn geolocation").

link

andrewcooke 4925 days ago

ah, thanks. sure, i understand that - i just never realised it was called geo-location (i have never worked with cdns).

link

petercooper 4925 days ago

You can get into the top million on Alexa with a minuscule amount of traffic so you'd be extremely limited. Losing the top 1000 would probably be a more interesting experiment for mid/long term purposes.

link

cfn 4925 days ago

They also have that (and 100k, 10k and 100).

link

ChikkaChiChi 4925 days ago

What I think would be more interesting is a proxy that only uses the first 1k, 100k, 1m sites.

I might be wrong, but it might be an easy way to keep users on the "bright streets" of the Internet instead of wandering down malware-ridden alleys.

link

measure2xcut1x 4925 days ago

So what's the criteria for removal? I.e. how does a domain get in the top 1m?

link

rb2k_ 4925 days ago

they probably just grab the alexa top 1 million csv file that they provide.

link

garretruh 4925 days ago

I can only imagine malicious uses for this. "Sorry, you're no longer allowed to access Google, Facebook, Twitter, or Wikipedia." Not that that is entirely a bad thing.

link

stephengillie 4925 days ago

Someone will use it for one of their anti-distraction productivity tools.

Is changing DNS easier or more difficult than editing a HOSTS file?

link

ikawe 4925 days ago

edit your /etc/resolve.conf

nameserver 1.2.3.4

link

dumbfounder 4925 days ago

How did they do their ranking? Is it based on a web crawl, dns stats, other? Is their list of the top million domains public? I would love to see the data.

link

dumbfounder 4925 days ago

I haven't seen confirmation, but they probably use Alexa given they make it easy to download their top 1 million list:

http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

link

smudgymcscmudge 4925 days ago

I don't get it. What's the point of this?

link

pixie_ 4925 days ago

You get 'popularly obscure' results. Which I think is a better name than million short.

link

pixie_ 4925 days ago

'IndieSearch' is even better lol.

link

sukuriant 4925 days ago

It comes already built with a "once it's popular, I don't like it anymore" feature!

link

makmanalp 4925 days ago

This could work as a pretty neat anti-procrastination tool. HN is ranked 2.9k and reddit is 100-something.

link