|
|
|
|
|
by jiggawatts
1412 days ago
|
|
Windows has something like 15,000 performance counters and error metrics that can be collected. There isn’t a system on earth that can even approach this. At scale, I have to pick and choose maybe 20-100 counters for fear of overloading a cluster(!) of servers collecting the data… once a minute. That’s because the protocol overheads cause “write multiplication” of a hundred-to-one or worse. Every byte of metric ends up nearly a kilobyte on the wire. Meanwhile I did some experiments that showed that even with a tiny bit of crude data-oriented design and delta compression a single box could collect 10K metrics across 10K endpoints every second without breaking a sweat. The modern REST / RPC approach is fine for business apps but is an unmitigated disaster for collecting tiny metrics. Set your goals higher than collecting a selected subset of 1% of the available metrics 60x less frequently than admins would like… |
|