Hacker News new | ask | show | jobs
by necovek 1667 days ago
That's what I suggested with

>> Scan the entire internet for domains pointing to s3-website, and check AWS API to see if it's available?

What I wonder is how do you scan all the DNS records with their subdomains? Unlike IPv4 address space, which is very decidedly finite and not-too-big, the space of all the subdomains is basically infinite.

Other than using AXFR (zone-transfer DNS request) which is usually restricted, you are searching an unbounded space.

I guess you don't need an AWS API calls since hitting a non-existing bucket with HTTP will let you know: http://something.that.does.not.exist.s3-website-eu-west-1.am...

IOW, how would you write such a bot? :D

3 comments

> how do you scan all the DNS records with their subdomains?

You needn't do this for stuff that would work in these "Hijack" situations.

Your target is any link that gets visited, maybe following a bookmark somebody made in 2018, maybe it's linked from some page that was never updated, maybe it's in an email somebody archived. If you're phishing you have one set of preferences, if you're doing SEO you have different preferences (you want crawlers to see it but not too many humans).

When anything follows that link, a DNS lookup happens. Most of the world's DNS queries and answers (not who asked, but what is looked up and the answer) are sold in bulk as "passive DNS". You buy a passive DNS feed from one of a handful of big suppliers, or if you're cheap you hijack somebody with money's feed.

So, you're working from a pile like:

  www.google.com A 142.250.200.4
  www.bigbank.com CNAME www1.bigbank.com
  www1.bigbank.com A 10.20.30.40
  charts.dft.gov.uk CNAME charts.dft.gov.uk.s3-website-eu-west-1.amazonaws.com
Obviously you can grep out all those S3 buckets and then you ask S3, hey, does charts.dft.gov.uk exist? And it says of course not, so you create charts.dft.gov.uk as an S3 bucket and you win.
Watching feeds of Certificate Transparency logs, and optionally going beyond those hostnames by using the newly discovered names to find additional ones, is one approach.

Google hosts a page [0] to search them, but there are other services/APIs that let you consume them in realtime - seeing certificate issuance live.

If you wanted to consume them programmatically without a 3rd party service, everything you need is in this repo [1].

0: https://transparencyreport.google.com/https/certificates

1: https://github.com/google/certificate-transparency-community...

There are size and character limits on DNS, so it's not infinite, although it may still be a pretty large space. Charts.(something well known) could have been a dictionary check though.

AXFR makes it a lot easier though.

Ah, I totally forgot about the domain name (255) and label (63) length limits: thanks!

Still, we are looking at roughly 38*255 possible options (a-z, 0-9, a hyphen and dot to separate labels; "roughly" because each label between periods can be up to 64 characters, labels must be non-empty, and hyphens can't start a label).

As you said, it's pretty large: compared to 2*32 of IPv4 or even 2*128 of IPv6, this is more than (2*5)*255 = 2*1275 options.