Hacker News new | ask | show | jobs
by jedberg 2436 days ago
It's interesting to me that they are going down this path instead of the microservices path. This seems like something ripe for slowly breaking down into microservices.

Someone made a change that took down production because of non-deterministic outcomes? How about break out whatever they were changing into it's own service? With proper fallbacks, breaking that part shouldn't take down all of production again.

To be clear, I'm not saying microservices will solve all their problems or be less work. I'm just saying that with an equal level of effort, they would probably get more overall reliability by having multiple services, they'd be able to use multiple languages, whatever is suited to the task at hand, be able to deploy even more often with less risk, and be able to isolate these types of "change on import" behavior to a much smaller surface on any given deployment.

4 comments

>Someone made a change that took down production because of non-deterministic outcomes? How about break out whatever they were changing into it's own service? With proper fallbacks, breaking that part shouldn't take down all of production again.

Yeah, now you'll have 10 interconnected services, 10x the complexity, and everything will have the ability to take down all of large parts of production, plus all the extra pain points of a distributed system...

You won't have 10 times the complexity if you are taking a monolith and making each section services. You'll have to same dependency graph, it will just use the network to make calls between them instead of being local.

You'll have added complexity with the network calls, which is why I said it wouldn't be any less work, just different work.

>You won't have 10 times the complexity if you are taking a monolith and making each section services. You'll have to same dependency graph, it will just use the network to make calls between them instead of being local.

Merely "use the network to make calls between them instead of being local" will add 10 times the complexity -- you suddenly have a distributed system, latency, delays, parts that can be on or off, de-centralized configuration (which can also get out of sync), and so on.

>it will just use the network to make calls between them

meaning that you get to throw network and server errors into the mix of things that can go wrong, and you get the fun of tracing failures back 3 hops to a server that decides to take too long to run a process one day and times out a connection downstream.

it's horrible debugging stuff like this.

Beyond increasing complexity, I think this also assumes a dependency graph that _can_ be broken down into microservices by the author/the author's team. From my experience a lot of things at this scale have such complex dependencies that unteasing those dependencies is difficult if not impossible without asking several teams to do something differently. And who knows how long that will take?
That's why you do it slowly. You take a small part of the monolith and make a service that does the same thing. Then you replace the code in the monolith with a call to the service, while keeping track of how often it is called in the monolith.

As you keep moving along, some things that depend on that first service will start calling the new service directly, and some will still call it in the monolith. But your tracking will tell you how often and who is doing that, so you can find out why.

In the meantime, nothing will break, because the monolith is still a pass through proxy to your service.

I think your comment makes perfect sense.

However, at their scale and with their engineering resources, I can only imagine an attitude of "we can make this work" (the monolith) is easier to justify. The same goes for the micro-services approach (except here you have to justify changing what has been working so far?)

I'd love to read more about the history behind this approach at Instagram.

Regardless of whether the monolith or microservices approach is the right way to go for their use case: I could very well imagine that it is too late for such a migration, and that it would hold them back for too long.