Hacker News new | ask | show | jobs
by meaty 4860 days ago
I love all these articles which are wholly ignorant of how complex software is in reality and how such advice isn't necessarily good.

Sometimes decomposition results in problems at the other end of the scale such as communication performance, data duplication, extremely nested abstractions, messaging complexity, contract and API versioning hell etc.

Getting the sweet spot between monolithic coupled blobs and fragmented latent deathtraps is an art which can't be puked out in a blog post. It takes literally years of experience and some guesswork and testing and thinking.

Ultimately, lots of small programs are just as painful as a single large one if they have to talk to each other or do IO.

7 comments

I'm pretty much convinced that the big problem today in software is deciding where your APIs should go.

My instincts are similar to the articles author, a preference for small discrete pieces software rather than a giant monolithic application. More Web Services!, if you will.

But you are correct, getting this sweet spot is hard. Truth be told, I am not even sure experience guarantees a successful design first time around.

There is a tendency with developers to want to keep everything nice and clean. For example app A is responsible for a data set, anytime other apps want to access it they have to talk to A, if they are asking for that data a lot you might be better off caching or periodically copying the data over to the parts of the system. I always try to decide whether to segment something by thinking about how many calls it is likely to receive as a web service, more than a couple in short succession and I start having doubts.

Ultimately what makes the entire system work best for users is the correct thing to do and sometimes it is very difficult to come up with something which does that and is pleasing to the discerning coders eye.

Adding to that, his stated advantage of not having to limit yourself to one platform also seems opposite of my experience (i.e. keeping all on the same platform is an advantage).

When you have a big system with disjoint parts written in different languages, re-use and refactoring is a pain, and redundancy is almost certain to creep in (and with redundancy often comes inconsistency).

Yes heterogeneous systems are much easier to deal with, although from experience certain systems are a pig to deal with from end to end (anything Microsoft as a rule).

Different languages are just different forms of integration and the mantra of integration is hell should be in the forefront of everyone's mind, always.

Depends on tools you use, I found Apache Thrift and Protobuf pretty to be sophisticated tools for integration between services.
Yet they are still entirely impractical for what we do. There is no one size fits all methodology which results in a heterogeneous communication layer. This means that you end up with technology fragmentation and therefore additional complexity.
I was just about to make the same argument. I completely see why this is tempting, but it quickly makes maintenance into a new layer of hell, and anyone supporting your production environment will hate you. Added to which, knowledge transfer becomes a huge problem, and it takes new developers and production support people a small lifetime to learn all the pieces and their touchpoints.
The problem with corporate, big-program development is that it's a premature abstraction.

If the system-of-small-programs doesn't perform, then you're in a state where larger programs might make sense. If the problem is well-understood and the pieces have been built and refined by competent programmers, but it's impossible to go any further without some coupling and integration, then a large program isn't the worst thing in the world. Really, that's what most "optimization" is: the use of about-the-system knowledge to make changes that, while they create couplings that exclude (by which I mean, may cause horrible things to happen, but that's irrelevant) unused cases, improve the performance of the used cases.

For example, with databases, you have requirements that are both technically challenging but also need to work together: concurrency, persistence, performance, transactional integrity. These involve an ability to reason about "the whole world" that can't be achieved with a system-of-small-things approach. That's a case where "bigness" actually imposes complexity reduction. But it has taken some very smart people decades to get this stuff right.

The problem with ad-hoc corporate big-program systems is that the one benefit of largeness-- complexity reduction-- never occurs because there is no conceptual integrity, but only a heterogeneous list of "requirements" that pile on and don't work together. You get the ugliness of "lots of small programs" but the APIs aren't even documented. Instead of reading crappy APIs to work on such systems, programmers have to read crappy code, which is even harder.

Small is the way to start. If you need to make a program large, there are intelligent ways of doing it, but it's best to start small and build enough knowledge so that, when largeness becomes necessary, the problem is actually well-understood.

Most enterprise systems start off with requirements similar to those you think of with a database - a lot of data with high expectations of performance.

For example, the program I work on has to support a million row database that can be sorted and filtered both on the server and client with subsecond response time. The program is incredibly configurable based on data in the system, so many of the features depend on reading data and reacting to it.

The problem with "many small programs" is the cost of communication. I can pass a pointer to a list of 100,000 items to be sorted and filtered in a trivial amount of time. If I have to serialize that list to json to pass to a separate program that then has to deserialize that list and perform the function, then reserialize the sorted/filtered list, send it back, re-deserialize.... it'll take longer to do the communication than it does to do the sort.

However, that's not to say that the idea of separation of concerns still can't be applied to large program. And in fact, most enterprise devs do exactly that. That's what all these "services" are in the program. Except that instead of having to serialize data, I can just pass them a pointer.

Just because you can't see all the different programs, doesn't mean they're not there.

I'm going to point to a development case which is outside the typical "database management" case everyone here seems to be thinking about: engineering modeling software like this: http://www.aspentech.com/fullproductlisting/

This company is ostensibly doing the right thing: they have developed a large number of "single purpose" programs. They also have some applications which attempt to integrate some of their technology into single packages. The problems, however, are exactly as you describe. From an end user perspective, having the various programs send data to one another is a crap shoot. Some applications are very tightly integrated while others seem to have been developed in a vacuum. The company has even developed an entire application that tries to fix this by allowing data to be automatically exported to and imported from excel. End users could try to use the COM interface to get and send data where they want it to go, but we have to remember that the target audience is engineers, not programmers.

> Most enterprise systems start off with requirements similar to those you think of with a database - a lot of data with high expectations of performance.

Not where I work.

And even then, this is no excuse to stick to a zeroth order heuristic, and make big programs every time. Some systems can be cleanly separated in simple components. Failing to see that is a waste.

Most enterprise systems start off with requirements similar to those you think of with a database - a lot of data with high expectations of performance.

Right, but there's a different process to it.

Databases solved a problem, and the requirements grew organically as people used them to solve harder problems. With product companies or with open-source software, the project owners can say, "We aren't doing that shit".

Enterprise projects accumulate requirements based on who has power within the organization. Each person who has the power to stop the project asks for a hand-out, and "We aren't doing that shit" isn't an option. It's like how businesses that want to operate in corrupt companies need to have a separate "bribe fund" for local officials. Over time, the result is an incoherent mess of requirements that make no sense together.

The requirement list for a typical enterprise project is the bribe trail.

However, that's not to say that the idea of separation of concerns still can't be applied to large program. And in fact, most enterprise devs do exactly that. That's what all these "services" are in the program. Except that instead of having to serialize data, I can just pass them a pointer.

Sure, but when you have a multi-developer project without an explicit API, what you end up is an undocumented and implicit API between peoples' code. This devolves into the software-as-spec situation where it's not clear what the rules are.

I think it's better to start with the inefficient service-oriented program, get that working, and then optimize with the merged, larger program if needed (and to document the API that has now become an implicit within-program beast).

> The requirement list for a typical enterprise project is the bribe trail.

I think this is purely a stereotype.

The behavior experienced is largely down to the fact that a large body of humans can't come up with a single consistent view of a large set of problems. You need singular control and ownership by someone with technical and business domain expertise. Some of this is politics (particularly from the MBA and psychotic corporate climber faction) but it's at least 80% standard human idiocy and ignorance.

I think from an architecture perspective (I'm an "enterprise architect" [whatever that is] by trade), clean service APIs are a good idea, but not necessarily the distribution model or fully decoupled integration path.

I totally agree that software development is somewhat non-trivial. I'm not saying that small programs would solve all problems you might face. Would you system be more maintainable? I think yes.

> Getting the sweet spot between monolithic coupled blobs and fragmented latent deathtraps is an art which can't be puked out in a blog post. It takes literally years of experience and some guesswork and testing and thinking.

I agree in the blog post, proper decomposition is the key if you want to write a good systems, and to be honest is really hard to achieve.

[Edit: deleted pointless bit here.]

If your comment consisted simply of its second and fourth paragraphs it would be better in every way and you would have contributed something of value.

I've read a thousand versions of this blog post over the years. It's decidedly abrasive as I'm tired with reading it to be honest. There is nothing new to be added to the discussion apart from people blindly falling over the same point, which is ignorant and not very well thought out and is not based on reality.

Apologies if you are personally offended, but my point still stands.

It's disgusting that your comment - a nasty, nearly content-free spit of cynical contrarianism - is currently the most highly-ranked reply to what was actually a congenial and well thought out blog post.