Hacker News new | ask | show | jobs
by OJFord 2292 days ago
What would your alert be for # open PRs (an example in the demo linked from posted page)? How often would it fire?

Whatever the answer, that's a different thing from this. Both have their place.

2 comments

If you just want to have a nice visualization to look at some numbers, fine. But, if you want to detect problems, it's ineffective. I saw too many companies do it to actually monitor the state of things and find out problems with charts, numbers, traffic lights etc.
You can do both. Especially at the beginning of a system's lifecycle and you don't really understand its behavior yet. Lots of times, people wandering by have said hmm, that doesn't seem right… Later, as we learned more, these hunches evolved into more advanced automated alarms.
But that's my point, it isn't for alerting about problems, some things have a 'status' that might be interesting, but isn't a problem, or something to fire an alert on necessarily.

You could have unintrusive notifications (inaudible etc.) to 'alert' to such statuses I suppose, if they were kept in view and not 'dismissed' (whatever that means for the medium they came in) - but then really you're just implementing a version of something like this Monitoror in your inbox, phone notification tray, Telegram channel, or whatever.

You're not going to rip out logging, prometheus, or services' that this connects to own UI just because you have alerting, so I don't see why you would this. It's like prometheus & grafana for higher level stuff. (Of course you could use those tools for this sort of monitoring too, but that's not really the point.)

A "nice visualization" is not necessarily just a "pretty"/"shiny" thing to show off to people. Human beings are highly visual creatures with outstanding visual pattern recognition abilities. Maybe you personally don't get anything out of them but the value of visualization is proven. Here are a few sources to get you started: https://www.csgsolutions.com/blog/15-statistics-prove-power-...
Why do you want a display of open PRs at all?

I think the fundamental question of all such tools is "Why are we watching this, and what are we looking for," and there are limited but nonzero good reasons to have a display. "Someone should look at open PRs if there are too many" is a bad one - the number doesn't tell you about the urgency of the existing PRs. If you want to respond promptly, respond to all of them promptly.

"We need to know if we're falling behind" is a possible reason to create an alert, not a dashboard. If you really want people to drop what they're doing and triage issues if there are too many, make an alert. If you don't, you'll just get a rectangle that turns red at some point and train people to ignore red rectangles on the board. (Relatedly: I added a pageable alert to my team a few years back to check whether there are a large number of non-pageable alerts, because it usually means something has gone wrong at a low level and we should investigate urgently. It's worked out pretty well, but the alert looks only at tickets created by our monitoring systems, not at tickets created by humans.)

"We need to see if we're getting worse" is a reason to have managers review graphs periodically, not a reason for anyone to stare at a single display. You can't track long-term trends from a status board.

"I need to see what to work on" is a valid reason, but much more useful in the form of a website you can visit on your own computer with links to PRs, not a raw number on a TV screen. (My team has a TV showing open tickets in our queue, both support tickets and automated alert, but we all have an equivalent link locally, too. Showing the names of tickets is useful for "Hey teammate, can you look at the second ticket there? Sounds related to a thing you were working on.")

I'd say there are roughly two useful cases for screens like this. One is to show to internal customers, so they say "oh, service X is yellow, so the slowness I"m seeing isn't just me, I'll do something else for a while." But those screens aren't primarily for the team that owns the product, they're for teams that depend on the product. (Such status boards can be either automated or manual.) The other is to show graphs of various metrics to see abnormal behavior, with the idea that no action is ever triggered by someone looking at the graph, but if you're already investigating something, it's useful to say "Hey, that's funny, this other thing spiked at about the same time even though it's within acceptable limits" and then you have a clue for investigation.

> Why do you want a display of open PRs at all?

All PRs are WIP, and minimizing WIP is very valuable in product development processes. See Reinertsen's The Principles of Product Development Flow for the math, but basically high/unpredictable latency drastically limits the pace of learning and causes a lot of upstream thrash and waste.

I remember talking with one team at the bird-themed social media company that was frustrated with slow PRs; they dropped average delay from 3-4 days to under 4 hours. They said it made a huge experiential difference and they loved the change.

Yes, I understand why you'd want to focus on solving the number of open PRs. I agree that keeping that number down is good. My question is why do you want to put this on a TV screen.

If you want people to focus on open PRs, tell them to open GitHub on their computers, don't tell them to look up at a TV screen periodically. Treat it like alerts: you have a list of open things to deal with and you need to get that number to zero. There's no threshold greater than zero of a long-term acceptable number of open PRs.

If the problem is that they have other things to look at too, installing yet another TV screen won't solve that, your team needs to make the management decision of what to prioritize. Options include making a unified dashboard of incidents/alerts/PRs/support tickets (and encoding which ones sort to the top), setting up a PR review rotation (i.e., for one week, completing reviews is your top priority barring all-hands-on-deck incidents), treating open PRs as alerts and escalating them if nobody replies within 4 hours, removing other work by deciding you'll deprioritize low-impact alerts (and hope that the increased development velocity ends up solving problems), etc.

The notion with information radiators not that you tell them to look up. The notion is that people naturally look at things while walking around or when idle, so it's valuable to make important things visible. It also serves as a way to trigger and focus discussions.

For example, consider the Kanban board. Here's one I built a while back: http://williampietri.com/writing/2015/the-big-board/

We loved having a physical map of what we were up to. We'd have our daily stand-up around board and discuss it. You'd know when something was completed, because you'd see somebody move a card. I would often know when the product manager was thinking about something he'd go over to look right at it. That often sparked conversations. And we'd all have a feel for how work was flowing, something we'd talk about in our weekly retro.

Could this have been replicated with a system of alerts? No. Alerts are interruptive and necessarily threshhold-driven. I don't want my people caught in a cycle of continuous reactivity to things that at some point in history were seen as important enough to configure an alert. Except for emergencies, I want them to be serene, thoughtful, and proactive, which is very hard to achieve if you're continuously juggling alerts.

So I'd put up something with PR stats if it were something I wanted us to be aware of. Especially so if it were an item of concern in previous retros. Maybe that would eventually lead to an alert (although I'd hope not). But the first step in solving a problem is understanding the problem, and I think information radiators are great for that, especially when problems are thorny and don't have obviously correct answers.

That's fair - I think part of it is also that you don't really have a green vs. red state (which is a good part of what I object to in the demo presentation), you just have a general feel, and no specific state is defined as an actual problem. (And most of what you're trying to achieve is a shared sense of what's being done, which is very different from a shared sense of what's broken and needs fixing.)