Hacker News new | ask | show | jobs
by dkarl 4886 days ago
You're really stretching the meaning of, "They want our software to tell them what to do. Period." In the real world, as you mention, it makes sense to tell people what to do, because humans usually make better and cheaper robots than robots do, but in computing you only need humans for two things: physically interacting with hardware, and human judgment. In my admittedly limited experience in this domain, what people wanted was alerts that indicated that human investigation and judgment were needed. The systems I worked on only ever told humans to do one thing: "Look! Look at this!" If the system knew more, it did it. (A nice feature of one system I worked with was that it automated the routine aspects of investigation, so when an alert was generated, relevant information for the type of alert such as traceroute output and previous alerts related to a host was gathered and attached.)

Bottom line: you never intentionally waste a human's time on things that don't require human judgment, and when human judgment comes into play, people want context. Good software helps people establish context for the decisions they make. The bike barometer, for example, doesn't actually tell you whether to bike or not. It just lets you know how a couple of factors balance against each other. The person reading the barometer will factor in other context such as how they feel that morning, whether they want some exercise that morning or would rather read a book on the tube, whether the bike is in good working condition, and so on.

Granted, the systems I worked with were mostly used by full-time operations personnel. If you're talking about a system for non-specialists who may need prompting and guidance along the lines of, "It looks like you're trying to handle more load than usual. Would you like to spin up a few more servers?" then I guess I can see people wanting the software to give them explicit orders.

2 comments

One thing I worry about with something like this, is that if you hide the detection and fixing behind layers of automation, things that are actually broken (or at least buggy in my example below) might get missed.

For example say you have a simple service, however every ~2 weeks it needs to be restarted because of a memory leak. If a human is in charge of this after having to go in and restart the service a few times every 2 weeks they'll know that something isn't right here. If it's automated though, the computer won't have this intuition. What if it is based on the number of requests served, and your traffic is sporadic, so the first time it is 2 weeks, then 3 days, then a month?

Automated interventions should be tracked as data in the system, so you can set thresholds and create alerts and reports on them just like you can on anything else.

Sometimes things fly under the radar, though, and that's where the dashboard-style "situational awareness" UIs really shine. Typically the people asking for a "dashboard" are executives who really shouldn't care, who only need regular briefings plus an occasional text from someone in operations warning them of a major customer impact. The people who benefit from them are engineers who browse around the system looking for trouble or simply satisfying their curiosity. "The Foo servers handle the Bar requests. I wonder what their typical CPU utilization is. I'll go check one of them... click click click. Whoa, that memory usage doesn't look good. I wonder if it's always like that. click WTF is this erratic sawtooth pattern? Do the other Foo servers have this, too? click click click Yeesh, somebody needs to fix that." That's the ideal case, anyway, if you have a rich UI that is good at presenting pages of data in context that can be understood at a glance, with quick navigation to related data. If the engineer clicks the "CPU utilization" button and gets back a line graph and a table of numbers, with no other context, then the UI is forcing the engineer to have tunnel vision. It should be dashboards all the way down, until the engineer starts running custom queries that the system doesn't know how to provide context for.

But yeah, the chronic restarting scenario should show up in reports and hopefully trigger an alert. I imagine that routine interventions (such as spinning up extra servers for load) and troubling interventions (such as restarting a service) are distinguished in reporting.

In the narrow context of software that exists entirely to deal with other software, you're probably right. But I build software that mainly optimizes non-virtual processes. These processes often require human oversight due to laws or safety regulations. Even in the cases where I'm building software where regulations don't get in the way, the real world is usually too messy for the computer to automatically take action because it would need to control a robot much more advanced than current technology allows for.
Ah, when you said "web analytics" I think I got confused between analytics delivered via a web app and analytics about the systems implementing a web app.