Hacker News new | ask | show | jobs
by yup_sto 656 days ago
Awesome, I will keep my eye on this for sure, I've spent the past few months tinkering with ingesting CT logs for bug bounty automation.

Curious if you're running your own CertStream server, or just continuously polling known CT logs with your own implementation.

2 comments

I also noticed you are ingesting/storing flowers-to-the-world.com certs, not sure what stage of optimization you are at but blacklisting/ignoring these certs in my ingestion pipeline helped with avoiding storing unnecessary data

I'm not sure but I believe that's used by Google internally for testing purposes.

For example if you search google, it returns 120k+ results, and these useless results are at the front.

> I also noticed you are ingesting/storing flowers-to-the-world.com certs, not sure what stage of optimization you are at but blacklisting/ignoring these certs in my ingestion pipeline helped with avoiding storing unnecessary data

The goal is to have something exhaustive so I'll keep them. But you are right that I probably should not put them at front. Not sure how important it is though as these results shouldn't match many queries.

Exhaustive/Robust is the way for sure.

Minimizing storage was a priority for me since it's just a small side-project/automation.

I've looked for information on what the hell the `flowers-to-the-world` entries are that pop and have found nothing, curious what's going on there.

It's actually a google thing!

I found that back then when I wondered the same: https://medium.com/@hadfieldp/hey-ryan-c0fee84b5c39

Ahhhh, that tracks, cheers mate.
I am not using certstream as we'd lose data on the first network error. The way it's designed is more "Rsync for ct logs" than something like a stream => storage system.

Btw, you can get our feed like that:

    curl -N 'https://api.merklemap.com/live-domains?no_throttle=true'