Hacker News new | ask | show | jobs
by jkodumal 1785 days ago
We've been using Sleuth at LaunchDarkly as a single pane of glass for all changes going out into production. The addition of DORA metrics is exciting-- as an engineering leader, Accelerate (https://itrevolution.com/accelerate-book/) is one of the few books I've read where the practices described meet reality. Having a tool track those metrics is a welcome addition.

One thing I'm curious about is Sleuth's approach to the "garbage in, garbage out" problem-- if I'm tracking (e.g.) MTTR, I've found that our teams aren't always perfect about tracking the start / end times of an incident. If the data's incorrect, can I modify it in manually?

1 comments

That's a great question, failure rate and MTTR are the two metrics teams have the most trouble getting a real handle on. We've found that different teams define change failure and MTTR in widely differing ways. Some customers just want to track incidents where others are using team KPIs as their definition of failure.

Today, you can manually update the status of a deploy as an incident, rollback, unhealthy or ailing. This allows you to "correct" data that Sleuth may have gotten wrong via integrations to Datadog or your incident management system. Right now the correction is at the deploy level. However, we do have more control coming soon so you can override any period of time as having been in a specific state.