Hacker News new | ask | show | jobs
by moabuaboud 1221 days ago
Are there any programmatic methods to detect integration breakages? Simply relying on unit/integration testing with mock endpoint is not effective, as it cannot capture changes made by third parties. Being notified of breakages is a solution, but not the most ideal one.

I've considered sourcing open API specs, like APIsguru.com, to scan for changes, but I was wondering if you have any other suggestions.

3 comments

It has been almost a decade since I've been involved at such an org, so I cannot speak to the state of the art. Folks currently doing the schlep are going to be more knowledgable than myself. With that said, you are likely to be alerted in a few ways of trouble with an integration:

1. Exception handling. Whatever code you have polling API endpoints or processing inbound webhooks is likely to throw an exception if the structure of the API it's consuming or inbound webhook message formats have changed materially. I recommend handling via Sentry or a similar application error reporting mechanism for triage by your SRE or platform team.

2. API responses. There is some peril here, as every API is different. Some APIs will behave as you'd expect with respect to error codes, error messages, and request allowances, while some APIs will reply with code 200 with the error message in the body. Again, this is the value incumbents offer; they know what failure looks like for each API, and they also have a good idea of what success and health looks like at steady state. Build relationships with API partners (do you have a partner team? you eventually should) so that you have open comms with them with regards to breaking changes, and code defensively in general. Tangentially, ensure you have robust logic around deduplication of polling data.

3. User reporting. If your unit tests didn't catch something, nor did your application error mechanisms, your users will absolutely let you know if a piece of JSON element landed where it shouldn't have in a target integration.

I'd encourage you to ask around to others in this space, as their recent knowledge will be more relevant for avoiding sharp edges. Also, once you've built whatever you're building, you'll be able to (or, at least, you should if you've approached this from a systems thinking perspective and wrapped the necessary telemetry and observation tools around the machine) observe at scale what optimal and suboptimal looks like.

Are you thinking of misc. sync errors here (auth issues, temporary downtime, etc.) or API changes?

For API changes: It's not super scalable but I've used a changelog monitoring chat (Slack/Zulip/Discord) channel. Automated notifications for changelog updates for all integrated APIs. Maybe there's an assigned person for each integration and that person scans the changelog updates for their assigned services then emoji reacts to the notification messages when they have been reviewed and cleared (they create issues for required changes or confirm no changes needed).

That's an innovative idea, I like it! Was it straightforward to set up automated notifications for the other services? Did you encounter any challenges or problems along the way?