Hacker News new | ask | show | jobs
by ishansgupta 3187 days ago
SiteStacks can find the technology used at any domain, including a set of roughly 700,000 that we’re regularly checking.

What makes the dataset unique is the combination of programmatic data (code breadcrumbs, network requests, DNS, some NLP, etc.), but augmented by data validated by users directly.

The user validated data is only available on Siftery (e.g. for sitestacks.com/uber.com you have to follow the link through to siftery.com/company/uber to see the full set), but all the programmatic methods are improved by user-validated data (e.g. if a method yields too many false positive, we bump it out).

We think this approach helps create the most accurate dataset of its kind. We’ve done some internal benchmarking and feel really good about it.

We’re looking for feedback on how this can be better, and open to partnering with others who want to make use of this data for good.

3 comments

> SiteStacks can find the technology used at any domain

I punched in a URL of a website I built, it didn't have data, went out to get some, then reported back that it couldn't.

Meanwhile https://builtwith.com/reservations.camprrm.com worked.

just tried this for our app, and it wrongly reported mandrill and flash (we’re not using any of them). we used mandrill a few years ago, so this might be some stale historical data, but the app never used flash.
What's the URL? We can report back exactly why it was picked up.

Even if we're wrong, it's exactly this kind of feedback loop that's built into the product and ultimately helps make the data better for everyone else.

Could mandrill stuff still be listed in your DNS? Like TXT or SPF or DKIM?
You already did a Show HN on this, though...

https://news.ycombinator.com/item?id=15249136