I also noticed you are ingesting/storing flowers-to-the-world.com certs, not sure what stage of optimization you are at but blacklisting/ignoring these certs in my ingestion pipeline helped with avoiding storing unnecessary data
I'm not sure but I believe that's used by Google internally for testing purposes.
For example if you search google, it returns 120k+ results, and these useless results are at the front.
> I also noticed you are ingesting/storing flowers-to-the-world.com certs, not sure what stage of optimization you are at but blacklisting/ignoring these certs in my ingestion pipeline helped with avoiding storing unnecessary data
The goal is to have something exhaustive so I'll keep them. But you are right that I probably should not put them at front.
Not sure how important it is though as these results shouldn't match many queries.
I am not using certstream as we'd lose data on the first network error. The way it's designed is more "Rsync for ct logs" than something like a stream => storage system.
Curious if you're running your own CertStream server, or just continuously polling known CT logs with your own implementation.