Hacker News new | ask | show | jobs
by thephyber 718 days ago
I work in cybersecurity and we need to be able to get info about CVEs extremely quickly from all major software vendors. The problem is this means we need to either subscribe to websites for webhooks (async notifications sent to us when an event happens) or poll a website. It turns out that polling is extremely inefficient, but is how it works most of the time because most of the websites we watch don’t support webhooks/push style notifications.

We create bot traffic, but we don’t want to. The problem is that the data we want isn’t available when we want it (we can’t wait days/weeks for the central CVE db to unembargo CVE records which have high impact) and isn’t delivered to us. Instead, we have to go through lots of effort to go get it. So we create a resilient crawler. And other similar companies / entities do too. Now we are all competing to get the same info in a short time, so we poll the sites too often. This all becomes a stress on the websites we hit.

All because the info should be open, but the companies with the info don’t want to build the most efficient system to distribute it. And there is probably legal liability for a middleman company to just crawl those websites and build a shim webhook system to push data as soon as it is found to webhook subscribers.