Hacker News new | ask | show | jobs
by cookiecaper 3365 days ago
Following the BottledWater-Pg link through to the Confluence blog [0], there's a hilarious illustration of what we're doing to ourselves.

The first flow chart [1] is simple. It shows a user, an app, and three separated data volumes that each serve a separate use case (db, cache, and index [assuming for OLAP workloads]). The chart is headlined with the adamant imperative "Stop doing this".

Instead, Confluence suggests that we start doing this [2]: user -> app -> DB -> extraction -> Kafka -> (Index <-> Cache <-> HDFS) -> monitoring -> samza. Ehhh, no thanks. I like the other option.

We need to understand that good engineering is not about making more work for ourselves. It's about simplicity and elegance, and being able to accomplish complex tasks WITHOUT wrapping ourselves up into some intractable mega-contorsion. More moving parts means more fragility and more waste. Simplicity means beauty, power, and flexibility.

Now, I'm not suggesting that such architectures are never justified. I just want to highlight that the complexity should be eschewed, not celebrated.

If you find yourself writing a blog post that converts a simple 3-step process into a complex 5-step, 9-destination process, alarm bells should be ringing, and you should be talking about why your organization (see Conway's Law) and/or the state of computer science sucks so bad that the 3-step process isn't good enough.

[0] https://www.confluent.io/blog/bottled-water-real-time-integr...

[1] https://cdn2.hubspot.net/hub/540072/file-3062873213-png/blog...

[2] https://cdn2.hubspot.net/hub/540072/file-3062873223-png/blog...

2 comments

I've seen this blog post and I find the 'Confluence' method simpler than the other method.

The problem of "I have to get a large portion of the DB into service X" is one I've worked on, so the initial solution is more fragile. It doesn't deal with back pressure. If a service goes down, it "loses" writes and must be resynced from a good state. If for whatever reason data science sets up a HDFS cluster I need to push writes there from my app.

With the second method - I don't have to use all those services - and while I'm not given the same latency guarantees I can be more sure that a user's given change will eventually end up in every service that cares about that given change.

Sure if you only need to write to one DB, the Confluence method is overkill - however if that solution works for you, I'd imagine you haven't hit the volume and the latency requirements that would require you to seek out a solution like Confluence's anyways.

>The problem of "I have to get a large portion of the DB into service X" is one I've worked on, so the initial solution is more fragile. It doesn't deal with back pressure. If a service goes down, it "loses" writes and must be resynced from a good state. If for whatever reason data science sets up a HDFS cluster I need to push writes there from my app.

It's hard for me to discuss this because the terms are loosely defined, but my feeling is that you may be making implicit false assumptions around the necessary design of the architecture.

>Sure if you only need to write to one DB, the Confluence method is overkill - however if that solution works for you, I'd imagine you haven't hit the volume and the latency requirements that would require you to seek out a solution like Confluence's anyways.

This explanation is probably the reason for the explosion in overengineering. People hear "Hey, if you're not making things really hard, you're just not important enough!"

Well, everyone thinks they're important, so of course, they must make things hard! If they don't, they're not important.

I work with an organization where most people insist we are at this scale. It's totally false. Our load could easily be managed by one well-tuned database replication setup per app and probably 3-4 app servers per app. But this isn't good enough, because, you see, we are very important.

That means that we have dozens of different types of data storage solutions scattered all over the place (including Mongo, Riak, and Dynamo in addition to a variety of SQL DBs), we have dozens of "microservices", and we have hundreds of app servers, even though the technical requirements could be fulfilled with much, much less.

So why do we have all that? Well, because we're "at scale", which is to say, we want to be important. We have a bunch of people sitting around an office all day who appreciate the feeling of importance more than the feeling of a well-engineered system.

Again, I'm not saying complicated architectures are never justified, but I think that in many if not most cases, complication arises due to organizational and personal psychology much more than any technical constraint that truly mandates it.

What the first chart doesn't show you is how much stuff is going on in the "web app" box. If you have e.g. a User entity, you would have to add code to write to the 3 secondary datastores at each point they are written to. And if you add another one, you have to go back and find every write and modify it.

In the second example you handle it once per entity and datastore and the "web app" doesn't need to be aware of the secondary stores.

It's just the listener pattern applied at infra level. You trade code complexity for infra complexity.

The question is where the complexity has the biggest cost. In most cases, it should be pretty easy to finagle a multi-write process into your existing data layer, and that without introducing new technologies/stacks/layers. The logic is thus unified and simple, in a single place, in a format that the company already has significant expertise in.

That is much easier to track, trace, understand, and debug than a request that flows through 7 servers, 5 data layers (that have their own configs, nuances, snafus, caveats, etc.), and 2 proxies, each of which introduces a point of potential corruption/breakage, before it finally reaches the place it's trying to go.

Anyway, like I said in the grandparent, I'm sure there are times when this is the best way to accomplish something. It's hard to nitpick an unfortunately-potentially-appropriate specific solution in the general sense, except to say that its potential applicability is unfortunate.

The problem is that people do not see such complexity for the unfortunate byproduct of poor technical and/or organizational architecture that it is, but rather as evidence of their own expertise. That is exactly backwards. We must fix that false impression to restore sanity to the profession.

Yup. I'm fully with you. You need have evidence of serious unavoidable problems before you go down the route of having lots of distributed systems and communication between them, because those systems are far harder to reason about globally, and debug. There's a high risk of undesirable emergent properties and cascading failure modes.