Hacker News new | ask | show | jobs
by aswerty 672 days ago
> In general, mature engineers are comfortable with working within some nonzero amount of uncertainty and risk

Just to take that sentence as a snapshot. I find the opposite is more relevant in the software field. Essentially, being solicited for an estimate on something where the certainty and predictability on what is being built is approaching zero.

There is no doubt the "softness" of software engineering as opposed to other forms of engineering is very distinct. To the point where there is an overarching question on whether it is engineering at all. This has resulted in the iterative Agile development process competing with, if not overtaking, the Waterfall development process that exists in other engineering disciplines.

And in software "engineering" the practical steps of construction are as intellectual an activity as the design. Where in other disciplines the design is considered an intellectual activity and the implementation is not.

I'm not going anywhere particular with this train of thought - other than surfacing the risks in comparing software development to traditional engineering.

2 comments

A frequent form of lack of tolerance for risk I’ve seen is not being able to make a speed vs quality trade-off. One example:

We were rolling out a change that had a small risk that we’ll have to manually reboot a couple of machines. The total disruption to business would’ve been less than $10k for sure. I had to fight people who wanted to spend 3 months writing a one-ff tooling lowering the chance of it happening. Madness!

Fairly common, especially with mission-critical infrastructure like databases. There's an implicit assumption, by engineers and managers alike, that 100% uptime is the gold standard and anything less is a failure. It takes a rational engineer (in the context of this discussion, usually a "senior") to point out that a) SLAs never promise 100%, b) the rest of the infrastructure that comprises the system has only a few nines of availability anyways, c) the engineering cost of getting from 99.99% to 100% is orders of magnitude higher than getting to 99.999%. In other words: senior engineers should be able to contextualize engineering work and do tradeoff analysis; they provide value not by doing more work but by skipping the expensive, low-impact work.
$10k in lost sales/product during the downtime or $10k + the cost of IT to stand things back up, verify, resync + cost of other departments manually fixing other adjacent things that broke?

People who don't work daily in infra tend to not understand that downtime like that can have massive ripple effects. That one server, unknown to you, might have tentacles that reach all over the company. It might generate 100 tickets that now need to be verified by various IT personnel over the next few days in addition to their likely already full workload. It might have fucked up backups, DFS, patching cadence etc etc.

Sure, the approach I advocated for can have much worse consequences in general. However, in this particular case it was ~impossible for the outage to get that costly - we operated the servers and knew the blast radius. My estimate was for the total cost.

Also, 2 engineers working for 3 months cost a ton of money, not even counting for the opportunity cost of other things they could’ve been doing. If the potential outage cost was closer to $100k I’d likely stick with my decision.

The difference between software developement and "other forms of engineering" is that you can copy software rather easily, but you cannot copy a bridge. If you could engineering would have the same issues in estimation.

In fact, take any engineering project that cannot be copied (like a new, big, custom airport) and you'll quickly see how much worth those "classical engineering estimations" really are.

A mature developer cannot magically give a better estimation. What they can do is communicate better, understand the value of POCs and which parts of a project to tackle first to reduce uncertainty as early as possible, as well as correctly describing uncertantity (e.g. NOT with a single number).

> you can copy software rather easily, but you cannot copy a bridge

You can copy a bridge just as easily as you can copy software: just print another copy of the blueprints. Just change the header from the name of one project to another, and you're done with the engineering at that new site. What's that, the span length is different? The soil is different? The traffic patterns and weather patterns and political climate and regulations are different? Of course they are. And when you copy source code, the use case is different, the hardware is different, the database is different, the inputs are different, the client is different. Every engineering job is custom. Software and bridge. The two disciplines are not as different as you say.

Bridge engineering also undergoes an "agile" methodology as the plans are repeatedly changed during conversations with the client, discovery of new regulations, ground-truthing, etc. Remember: the outcome of engineering is plans, not a finished product.

You're talking about differences in constructing the thing, after the engineering is complete. That's largely irrelevant to the engineering cost.

Now you have copied the blueprint, but not the bridge. (and before you complain that software is also just a blueprint, the difference is: in one case you estimate the blueprint and not its execution, in the other case both (or even just the execution)
A lot of delays in engineering projects are caused by various political pressures, changes coming in the middle for whatever reason, natural or other physical disasters/events impacting physical construction way more than software development. Or new regulations complicating things more than previously thought.