API Monitoring: Up Is Not Enough (2014)

Y	Hacker News new \| ask \| show \| jobs

	API Monitoring: Up Is Not Enough (2014) (pagerduty.com)
	33 points by heitortsergent 3415 days ago

2 comments

elevensies 3415 days ago

This reminds me of Steve Yegge's google plaforms rant, which I'm sure I've posted and reposted before:

monitoring and QA are the same thing. You'd never think so until you try doing a big SOA. But when your service says "oh yes, I'm fine", it may well be the case that the only thing still functioning in the server is the little component that knows how to say "I'm fine, roger roger, over and out" in a cheery droid voice. In order to tell whether the service is actually responding, you have to make individual calls. The problem continues recursively until your monitoring is doing comprehensive semantics checking of your entire range of services and data, at which point it's indistinguishable from automated QA. So they're a continuum.

https://gist.github.com/chitchcock/1281611

In smaller projects I've worked on, I'm not willing to expend the level of effort required for truly comprehensive monitoring, but I try to do some kind of end-to-end test of the whole system as part of the monitoring -- something that requires all components to be alive and working -- so that I'm aware of issues at least as fast as the people using the system.

link

ryen 3415 days ago

Some shops call this "synthetic monitoring" where the tests run are a subset of a full integration test, that can be run at regular, but timely, intervals of say every 10 or 15 minutes. "The happy path" for some typical use cases.

This monitoring shouldn't brush aside other system-level monitoring - which can alert on abnormal memory, disk space, error rates, etc.

link

dasil003 3415 days ago

There's also an analogous split to unit testing vs integration/system testing. You want system-wide monitoring to give the strongest guarantee that a service is actually up and available to customers, but you also want monitoring on each component so you can pinpoint the source of failures more quickly.

link

johns 3415 days ago

Author here. It's been awhile since I wrote this and we've learned a ton more about what good API monitoring looks like. So if you have any questions, let me know!

link

andyfleming 3415 days ago

Are there any updated resources you've written or recommendations based on what you've since learned?

link

dozzie 3415 days ago

Now that we have an idea on how looks monitoring API, how would you monitor documentation?

link

nytopop 3415 days ago

With documentation documentation.

link