Hacker News new | ask | show | jobs
by mabbo 3155 days ago
My personal view is that redundancy on stupid things is okay. That is, things that are very simple, you should be okay with doing over and over in multiple modules if you need it.

The experience that lead me to this was a particular team I was on where a previous developer had said "aha! All of these pieces of code relate to the same domain model, so we should build a common package of domain model objects and have all the modules share those objects! That way, we don't have to write them all over and over again!"

These weren't just model objects for data passing, but actual business logic inside these models. Usually very simple stuff, but occasionally a particular module would need some special logic on those objects, so in it went.

The problem was that over time, there modules started drifting. Their domains became different ever so slightly and the owners of those modules became different teams. Now if you wanted to modify the domain object, you had to check it wouldn't break code the other teams were developing.

It became a real mess. And all to avoid writing the same simple pojo classes more than once.

8 comments

Something that helped me a lot was somebody pointing out that it's OK for code not to be 100% DRY - sometimes things are the same at a given point in time only by coincidence, not by some inherent logical connection. It's a mistake to refactor and remove these "redundant" parts because they are not truly redundant. This helps me chill out when deciding when it is useful to add an abstraction or a separate helper function or whatever.
Indeed. I don't know if it's my age or my domain, but this is something that comes up a bit on the design of my own projects and code that I try to get younger programmers to think about, and which might look weird to those who have rote learned the "don't repeat yourself" mantra, or the culture of "just import everything from modules". Sometimes it's responsible to repeat yourself if you have good reason to believe things will shift/change in the future.

A lot of young data scientists and analysts (and IT types in general) code in a way that solves the immediate problem.

But with a bit of time you start to realise that the initial brief is anyways part 1, an executive/customer will change their mind 15 times before the end of the project. What seems like the same problem now will not be the same problem in two weeks, let alone two years. Doubly so if you're interacting with entities or sources that aren't software engineers.

Over the long run, excessive modularisation creates what I'll call Frankenstein programs. You've been tasked with making a man, and what you end up with is a shambling golem made from rotting, pulsating, mutating parts all stitched and held together. If you're really unlucky, it will evolve further into the akira/tetsuo program, where you begin to lose control of the mutations until it self destructs under its own complexity.

The interesting part is that the answer to this can also be partly found in nature: you modularise and specialise, but you also make strategic choices where you're deliberately redundant.

Too much redundancy is spaghetti code. Modularisation and structure save you there.

Not enough redundancy leaves you vulnerable to changes in your environment and mutation as the project ages and evolves.

As I've gotten older, I'm placing more and more value on the later. Your mileage may vary...

> Too much redundancy is spaghetti code.

Well, there's uncooked spaghetti code and cooked spaghetti code. :)

What I mean is that redundancy can be uniform, obvious, and easy to encapsulate later (if need be). Alternatively, it can be unpredictable, baroque, and difficult to reason about.

I think I've come to a different conclusion reading your run down of this. The key paragraph is this:

The problem was that over time, there modules started drifting. Their domains became different ever so slightly and the owners of those modules became different teams. Now if you wanted to modify the domain object, you had to check it wouldn't break code the other teams were developing.

Isn't the real issue here that the architecture didn't keep pace with reality? It sounds like the dev who made the package had the right ideal. The real issue was subsequent devs introducing their domain specific stuff into a common package, instead of extending it or composing it with their own domain specific code.

The point of the article is that separating the common code into two different pieces is often better than "extending or composing" the shared code. Merging code together is fine if you're willing to separate it again, but lots of devs aren't.
Domain models can be shared as long as it's clear from the beginning that there is a "one true domain model" that the library is targeting, that everyone agrees on, and that has no reason to ever change. You see ADTs like this in language stdlibs and RDBMSes: they have domain objects for Datetimes, for IP addresses, for UUIDs, for URIs, etc.

Basically, if the semantics of your domain model are specified by an RFC, you can probably get away with turning them into a shared library dependency. Because, even if they weren't shared, everyone would just end up implementing exactly the same semantics anyway.

If someone's implementation was "off" from how the RFC did it, that implementation wouldn't just be different—it'd be wrong. There are no two ways of e.g. calculating the difference between two datetimes given a calendar. There's one correct way, and you can make a library that does things that way and be "done."

---

On the other hand, there is a good reason that Rails et al don't automatically create a migration that creates a User model for you. Every app actually has slightly different things it cares about related to the people using it, and it calls these things a "User", but these aren't semantically the same.

In a microservice architecture, service A's conception of a User won't necessarily have much to do with service B's conception of a User. Even if they both rely on service C to define some sort of "core User" for them (e.g. the IAM service on AWS), both services A and B will likely have other things they want to keep track of related to a User, that is core to how they model users. It might be neatly separated in the database, but it'll be hella inconvenient if it can't be brought together (each in its own way) in the respective domain models of A and B.

> On the other hand, there is a good reason that Rails et al don't automatically create a migration that creates a User model for you.

Once you ignore frameworks like Django, which provide you with user model, then yes, you'll be correct.

My current approach is to be liberal with redundancy as I begin a prototype and then gradually replace it with dependencies as my code matures. This approach seems most realistic to me and I have used it to much success.

Ill-formed or too many dependencies seriously constrain speed of development and much worse, they suck the fun and flexibility out of development. "Build tools" that manage dependencies are usually opaque about whats gone wrong. Not fun. Spent too many hours trying to find "which version of this library does this function need?"

Too much redundancy is a bug magnet and I cannot emphasize how much I loathe it. There really is no good answer to 'Why is this function duplicated with one extra parameter ?'

One way to theorize about that experience is to note that the important argument for reuse is that changes only have to be made once to take effect everywhere. The classic change being the bugfix.

But the buried assumption there is that you actually want the change to happen everywhere. A hard thing to decide without hindsight.

I like this. Perhaps the heuristic for refactoring redundancy should be not how many times you've written the code, but how many times you've made the same fix in multiple places.
Default behaviors with custom overrides are the way I've always dealt with such "redundancy". In the end, tracing down the default behavior with it's own default behaviors often causes me to pull them out as simple pojos. I think there is no silver bullet, but that redundancy has a usability X factor that often goes unaccounted for, in the abstract.
It would be nice to fork a class and use it in your project customized to your needs. The forks need never be re-integrated upstream, but that they are forked is documented and the possibility for reuse and generalization is at least a bit more likely.
Indeed. But the political problem is that you can only make this argument retrospectively. When you say this is how it will turn out, you’re advocating for redundancy and everyone learned in Comp Sci 101 or Smashing Magazine or whatever that redundancy is bad. It takes longer to understand why dependencies are bad, in fact, worse as TFA says.