Hacker News new | ask | show | jobs
by michaelochurch 4860 days ago
The problem with corporate, big-program development is that it's a premature abstraction.

If the system-of-small-programs doesn't perform, then you're in a state where larger programs might make sense. If the problem is well-understood and the pieces have been built and refined by competent programmers, but it's impossible to go any further without some coupling and integration, then a large program isn't the worst thing in the world. Really, that's what most "optimization" is: the use of about-the-system knowledge to make changes that, while they create couplings that exclude (by which I mean, may cause horrible things to happen, but that's irrelevant) unused cases, improve the performance of the used cases.

For example, with databases, you have requirements that are both technically challenging but also need to work together: concurrency, persistence, performance, transactional integrity. These involve an ability to reason about "the whole world" that can't be achieved with a system-of-small-things approach. That's a case where "bigness" actually imposes complexity reduction. But it has taken some very smart people decades to get this stuff right.

The problem with ad-hoc corporate big-program systems is that the one benefit of largeness-- complexity reduction-- never occurs because there is no conceptual integrity, but only a heterogeneous list of "requirements" that pile on and don't work together. You get the ugliness of "lots of small programs" but the APIs aren't even documented. Instead of reading crappy APIs to work on such systems, programmers have to read crappy code, which is even harder.

Small is the way to start. If you need to make a program large, there are intelligent ways of doing it, but it's best to start small and build enough knowledge so that, when largeness becomes necessary, the problem is actually well-understood.

1 comments

Most enterprise systems start off with requirements similar to those you think of with a database - a lot of data with high expectations of performance.

For example, the program I work on has to support a million row database that can be sorted and filtered both on the server and client with subsecond response time. The program is incredibly configurable based on data in the system, so many of the features depend on reading data and reacting to it.

The problem with "many small programs" is the cost of communication. I can pass a pointer to a list of 100,000 items to be sorted and filtered in a trivial amount of time. If I have to serialize that list to json to pass to a separate program that then has to deserialize that list and perform the function, then reserialize the sorted/filtered list, send it back, re-deserialize.... it'll take longer to do the communication than it does to do the sort.

However, that's not to say that the idea of separation of concerns still can't be applied to large program. And in fact, most enterprise devs do exactly that. That's what all these "services" are in the program. Except that instead of having to serialize data, I can just pass them a pointer.

Just because you can't see all the different programs, doesn't mean they're not there.

I'm going to point to a development case which is outside the typical "database management" case everyone here seems to be thinking about: engineering modeling software like this: http://www.aspentech.com/fullproductlisting/

This company is ostensibly doing the right thing: they have developed a large number of "single purpose" programs. They also have some applications which attempt to integrate some of their technology into single packages. The problems, however, are exactly as you describe. From an end user perspective, having the various programs send data to one another is a crap shoot. Some applications are very tightly integrated while others seem to have been developed in a vacuum. The company has even developed an entire application that tries to fix this by allowing data to be automatically exported to and imported from excel. End users could try to use the COM interface to get and send data where they want it to go, but we have to remember that the target audience is engineers, not programmers.

> Most enterprise systems start off with requirements similar to those you think of with a database - a lot of data with high expectations of performance.

Not where I work.

And even then, this is no excuse to stick to a zeroth order heuristic, and make big programs every time. Some systems can be cleanly separated in simple components. Failing to see that is a waste.

Most enterprise systems start off with requirements similar to those you think of with a database - a lot of data with high expectations of performance.

Right, but there's a different process to it.

Databases solved a problem, and the requirements grew organically as people used them to solve harder problems. With product companies or with open-source software, the project owners can say, "We aren't doing that shit".

Enterprise projects accumulate requirements based on who has power within the organization. Each person who has the power to stop the project asks for a hand-out, and "We aren't doing that shit" isn't an option. It's like how businesses that want to operate in corrupt companies need to have a separate "bribe fund" for local officials. Over time, the result is an incoherent mess of requirements that make no sense together.

The requirement list for a typical enterprise project is the bribe trail.

However, that's not to say that the idea of separation of concerns still can't be applied to large program. And in fact, most enterprise devs do exactly that. That's what all these "services" are in the program. Except that instead of having to serialize data, I can just pass them a pointer.

Sure, but when you have a multi-developer project without an explicit API, what you end up is an undocumented and implicit API between peoples' code. This devolves into the software-as-spec situation where it's not clear what the rules are.

I think it's better to start with the inefficient service-oriented program, get that working, and then optimize with the merged, larger program if needed (and to document the API that has now become an implicit within-program beast).

> The requirement list for a typical enterprise project is the bribe trail.

I think this is purely a stereotype.

The behavior experienced is largely down to the fact that a large body of humans can't come up with a single consistent view of a large set of problems. You need singular control and ownership by someone with technical and business domain expertise. Some of this is politics (particularly from the MBA and psychotic corporate climber faction) but it's at least 80% standard human idiocy and ignorance.

I think from an architecture perspective (I'm an "enterprise architect" [whatever that is] by trade), clean service APIs are a good idea, but not necessarily the distribution model or fully decoupled integration path.