Hacker News new | ask | show | jobs
by pickledish 1064 days ago
This (the unimportant stud becoming load bearing later on) makes sense, but in my experience it’s kind of a sign of lazy design. When making software at least, you know when you’re trying to use a decorative stud to hold up part of your house, and choosing to do it anyway instead of building some new better structure does make for a pretty sad dev team later on

This is to say —- I agree with the article, but much nicer is to work at a place where you don’t expect to make this particular discovery very often, hah

9 comments

> you know when you’re trying to use a decorative stud to hold up part of your house

"You" might know it in "your" creations, but in my career I am much more often working and reworking in other people's creations.

I think the point of the article is not that you should avoid using decorative studs as load-bearing elements, but that you should be aware that others may have done so before you came along.

This is an even more conservative position than the default Chesterton's Fence reading, which is itself dismissed by a lot of people as pedantically restrictive.

For me, the parent article resonates. I have definitely had ceilings come crashing down on my head when I removed a piece of "ornamental" trim (programmatically speaking)

"This is an even more conservative position than the default Chesterton's Fence reading, which is itself dismissed by a lot of people as pedantically restrictive."

In a normal, real-life context, I can see why someone would feel that way.

In a software engineering context I think it's just a further emphasis that you ought to understand what something is doing before fiddling with it, and both the original intent and what it is currently doing are interesting information. Many times I've removed dead code, only to learn that not only was it alive (which wouldn't have been that surprising, it's easy to miss some little thing), but that it is very alive and actually a critical component of the system through some aspect I didn't even know existed, which means I badly screwed up my analysis.

The differences between the physical world and the software world greatly amplify the value of the Chesterton's Fence metric. In the physical world we can all use our eyes and clearly see what is going on, and while maybe there's still a hidden reason for the fence that doesn't leap to mind, it's still a 3-dimensional physical problem in the end. Software is so much more interconnected and has so much less regard for physical dimensions that it is much easier to miss relationships at a casual glance. Fortunately if we try, it's actually easier to create a complete understanding of what a given piece of code does, but it is something we have to try at... the "default view" we tend to end up with of software leaves a lot more places for things to hide on us. We must systematically examine the situation with intention. If you don't have time, desire, or ability to do that, the Chesterton's Fence heuristic is more important.

We're also more prone to drown under Chesterton's fences if we're not careful, though. I've been in code bases where everyone is terrified to touch everything because it seems like everything depends on everything and the slightest change breaks everything. We have to be careful not to overuse the heuristic too. Software engineering is hard.

In software, I would strongly disagree with this:

> you ought to understand what something is doing before fiddling with it

I think understanding before fiddling is one option. But I think a better option is often fiddling and seeing what happens. The trick is to make it so that fiddling is safe. E.g., on a project where I have good test coverage and find mystery code, I can just delete it and see what fails. Or I set up a big smoke test that runs a bunch of input and compares outputs to see what changes.

A lot of bad software is effectively incoherent, so it can't be understood as machinery. Instead it has to be understood in psychological, historical, or archaeological terms. "Well back then they were trying to achieve X, and the programmer doing it was not very experienced and was really interested in trendy approach Y so he tried using library Z, but it wasn't really suited for the problem at hand, so he misused it pretty severely."

That can be interesting, but it's often much more efficient to say, "Who cares how or why this got to be such a tangled mess. Let's solve for the actual current needs."

> But I think a better option is often fiddling and seeing what happens. The trick is to make it so that fiddling is safe. E.g., on a project where I have good test coverage and find mystery code, I can just delete it and see what fails. Or I set up a big smoke test that runs a bunch of input and compares outputs to see what changes.

There was a good discussion of this in the comments, starting at https://www.jefftk.com/p/accidentally-load-bearing#fb-109168...

In addition to tests, you can also add logging to your running system, or make the change behind a flag that you A/B test in production.

The problem “fiddle and see if Rome burns” is that you may have broken something that only runs once a year/decade and you won’t know if that process isn’t in the list of tests.

And by the time it breaks, will anyone remember the probable cause?

If people are building things that only run once a year or once a decade and haven't included sufficient tests to make sure those things work and keep working, then the failure lies with them, not with the people who work on the system later. The real "probable cause" isn't some later change. It's the initial negligent construction.
"But I think a better option is often fiddling and seeing what happens."

If you're safely fiddling with it, I would consider that part of the process of understanding. I'm particularly prioritizing understanding what is actually doing. Historical context is primarily useful because it will point you in the direction of what else you may need to look at; when I fail to realize that removing X broke Y it's because I didn't realize that there's a reason those are together.

I agree, but what I'm talking about is distinct from a usual "process of understanding", because the goal is explicitly remain as ignorant as possible. In many cases I want to learn the exact minimum needed to safely make a change. What were they thinking? What motivated them? Don't know, don't care. I just want to clean up the mess and move on to something actually useful and pleasant.

As a practical example, last year I was supposed to figure out why a data collection system was not getting all the data we wanted. The person who worked on it was long gone, so I looked at the code. It was a bunch of stuff using Apache Beam. No tests, no docs. The original author was not a very experienced programmer. It jumbled together a variety of concerns. And after a day of trying to understand, it became obvious to me that some of the uses of Beam were unconventional. Plus Beam itself is its own little world.

The next day, I said "fuck that", and wrote a very simple Python requests-based collector. I pretty quickly had something that was both much simpler and much more effective at actually getting the data. And from there I went into the usual sort of feedback-driven discovery process to figure out what the current users actually needed, with zero regard for original intent of the system was in solving problems for people no longer present.

What was the thing doing? Why did it to it? How did it end up that way? All excellent mysteries that will remain mysteries, as I eventually deleted all the Beam-related code and removed its scheduled jobs. For me, this ignorance was truly bliss. And for others too in that the users got their needs solved with less work, and that developers had something much cleaner and clearer to work with in the future.

> But I think a better option is often fiddling and seeing what happens.

Yeah, as always, IMMV.

But I do agree that online discourse puts too much emphasis on statically analyzing systems, and too little on adding instrumentation or just breaking it and seeing what happens.

At the same time, my experience is that on practice people put too much emphasis on instrumentation or just breaking it and seeing what happens, and way too little on statically analyzing the system.

> in my experience it’s kind of a sign of lazy design

Is it always lazy in the bad way, though? In software there's no sharp distinction between "built to carry weight" and "built to tack drywall onto." Whether a system is robust or dangerously unscalable depends on the context. You can always do the thought experiment, "What if our sales team doubled in size and then sold and onboarded customers as fast as they could until we had 100% of the market," and maybe using your database as a message queue is fine for that.

If it results in a sad dev team, then that's a case where it was a mistake. It's hard to maintain, or it's an operational nightmare. That isn't the inevitable result of using a (software) decorative stud as a (software) load-bearing element, though. There are a lot of systems happily and invisibly doing jobs they weren't designed for, saving months of work building a "proper" solution.

Let's say you code defensively. You add some handling invalid input to your function. Because the rest of your codebase never sends it invalid input it's dead code - not load bearing. Until at some point a bug is introduced and sends you invalid input which is then dutifully handled and recovered from. The branch has become load bearing.
I've been happily running a service that's non-critical, only to discover when we have an outage (that should be a non-event) that another team has started relying on it for something business critical.
This was famously a problem for Google's distributed lock service, Chubby. They handled it by intentionally having outages to flush out ways it might have started to bear loads it wasn't designed for: https://sre.google/sre-book/service-level-objectives/#xref_r...
I'm a fan of the 'chaos monkey' (Netflix software) approach of this.

Can't expect your platform to be reliable, if it just breaks at random.

> you know when

More often than not I've seen this happen because they, in fact, do not know.

Well, let's say you are working on a well engineered and tested product, and you look at the code coverage, and there's a whole lump of functionality that has no coverage.

You could conclude that the code is unnecessary and remove it, or you could conclude that some test cases need to be added to exercise it. How do you decide which is correct?

The problem is usually that well thought and and designed software was build for a moving target, and invariably things have changed over time. It's not necessarily a sign of lazy design, it's where the real world intersects with the nice neat pretend world we design for :)

I'm deeply frustrated at the missing answer here. read the code. figure out what's it for. take out it and see what breaks. software is not a house, you can completely wreck it and set in back exactly the way it was in a second.

there is no excuse for not owning and knowing the software you are supposed to be in control of.

Well, if it isn't covered, then you can take it out, and see what breaks (nothing) and then discover in production why it was there :)

But yes, it's basically what the job comes down to - having a strategy for managing complexity in all it's forms, and this is a fine example of the sort of problem that you don't learn in college.

I've (thankfully) never deprecated code and caused serious production issues, but i've seen it happen. The best places to work expect this sort of issue, and have process in place to roll back and deal with it, like any other business continuity issue (e.g. power/network loss).

The moment you find yourself scared of changing code because you don't understand the consequences then you've basically lost the battle.

hopefully there is some other place to try your code than in production. that gives you the agency to say "lets just take it out and see what happens".
>When making software at least, you know when you’re trying to use a decorative stud to hold up part of your house

Do you? Sometimes quick-one-time fixes becomes the center of important software.

>When making software at least, you know when you’re trying to use a decorative stud to hold up part of your house, and choosing to do it anyway instead

Whenever I find myself doing this, I at least leave a comment typically worded along the lines of "the dev is too lazy, not enough time to do it right, or just has no clue what to do, so here you go..."

s/lazy/lack of/

Hidden files in unix were a bug.

And then .DS_Store in MacOS was basically a bug, too. A lack of understanding coupled with a lack of design.