Hacker News new | ask | show | jobs
by _kblcuk_ 1942 days ago
> In my experience, they're always wrong. These systems can be run locally during development with a relatively small investment of effort. Typically, these systems are just ultimately not as complicated as people think they are; once the system's dependencies are actually known and understood rather than being cargo-culted or assumed, running the system, and all its dependencies, is straightforward.

Whenever I hear these statements, it always sounds like "I need to have an identical copy of this skyscraper in order for me to be able to replace one tap on floor 42."

Also good luck running a system that operates on few hundred of terabytes of for instance YouTube data locally.

Also running the whole system locally is usually a pretty good way of creating a "distributed monolith" -- yea, there might be microservices, but also a dozen of assumptions here and there that different parts of system are being deployed simultaneously (usually they are not), or that certain distant parts of a whole system share some behavior that can be changed simultaneously.

So no, you don't need to run the whole system locally. On the contrary, you need to be able to run smallest part of it (hello, microservice) locally, and that part should be responsible for one thing. APIs and frontends can easily share JSON schema to make sure they send and receive valid data, and each service can have tests against that schema, the ultimate source of truth for them.

Boom, suddenly I can develop "big system" piece by piece in isolation on my 7-year old macbook with no problems against tests / storybook / debugger.

4 comments

>Whenever I hear these statements, it always sounds like "I need to have an identical copy of this skyscraper in order for me to be able to replace one tap on floor 42."

The building industry also has the problem the OP describes:

https://i.stack.imgur.com/yHGn1.gif

The problem as I see it is that people who go all in on unit tests tend to be dogmatic about it and suffer the above type of issue whereas the people who want, broadly speaking, to run things as realistically as possible are pretty aware of the real constraints.

Moreover modeling larger is also frequently cheaper because the real thing often comes for free while creating elaborate frequently changing unit test mocks has very high opex.

Oh yea, and also you get that fuzzy "I clicked random things around, so everything should work, because I run everything locally" feeling. ¯\_(ツ)_/¯
> Also good luck running a system that operates on few hundred of terabytes of for instance YouTube data locally.

Why can't that be scaled down to a few hundred megs of data running through a system, end-to-end, on the local machine?

Not being able to scale down seems like a code smell to me. Is there so much overhead in the microservices that even at idle, or very low usage, they can't even be started?

For instance PostgreSQL's performance might be very different depending on the size of data query has to operate. Also if you deal with something like search results, it gets pretty annoying to deal with "well, I don't have enough data locally for this particular thing I develop".

Again, talking here about the "run the world and click on things" approach vs developing against tests / API schemas / whatnot.

Ive had pretty good results with scaling down real world data and using it to run tests against it.

Postgres performance will be different to prod, yes. That's all part of the realism - cost trade off. All models are wrong, some are useful.

The point is that diving headfirst into making matchstick model unit tests is dumb when you could build something to test against at reasonable cost which is a lot more representative of reality.

IMO this is an obvious point if you adopt a cost/benefit approach to development but it's often impossible to see for the dogmatists.

I think we're talking about slightly different things here. Running service that I need to develop right now with a lot of local data (or even read-only connection to production mega-db) is one thing. But needing to run that service with a lot of data only to be able to develop some other part of the system (that is not even related to that particular service, but "nothing works" without it) is a pretty annoying development experience.
Load / performance testing isn't always feasible in a local environment, sure. Everything else, though? It ought to be.
What if the error only occurs after ~1TB of data has been created?
Well then you have to do some bug hunting, but at least you'd have a fully working version of the system in miniature that you can test any hypotheses against.

To be honest, short of issuing every developer with their own production-grade environment to debug against I am not sure what exactly would satisfy this line of questioning.

There's nothing wrong with running your app locally. We're writing software, not building skyscrapers.

You don't need hundreds of terabytes to run an app with test data.

Monolith's aren't bad.

Running app locally -- yes, there's nothing wrong with that. Running 25 different apps that compose "the platform" only to be able to develop something -- sounds overkill to me.
If you have 25 apps to run locally, chances are that your system is over-engineered.
It depends what kind of system you are running.

NASA had two versions of the space shuttle software developed in parallel by teams at IBM and Rockwell. None of the devs had a whole space shuttle.

Of course there are always exceptions. More than likely, people on here aren't thinking about NASA software.
Which is probably why they ran it all locally first!
> You don't need hundreds of terabytes to run an app with test data.

Well you could just mock the data and code, but then the according to the author:

> "Run an individual service against mocks" doesn't count. A mock will rarely behave identically to the real dependency, and the behavior of the individual service will be unrealistic. You need to run the actual system.

If your data is too large to replicate locally then downloading a subset of data is better than nothing.
Test data is not a mock.

What matters here are the odds that the production system will behave the same way as your tests. For functional requirements (not performance), the normal situation is that if you replace your data, those odds do not decrease a lot. If you want to test performance, that changes, and you may need an environment as large as your production one.

> and you may need an environment as large as your production one.

Yes, but isn't that exactly what the author suggests?

The author is suggesting running your code (all of it) in a separate environment, that isn't prod. There is a passing acknowledgment that data exists, but nothing more about it. Very likely, he don't talk about data because bringing all of your data into another environment is indeed not viable for a lot of people, or even legal for some.

If you replace all of your data, it's still your code running there. But you must have some data, or your code won't run, and it must look like real data, or your environment will be fake again... where "look like real data" is completely problem dependent.

If your assumptions of a system fail on a single box they will fail more dramatically when they are distributed. If your system is designed such that it can't be executed on a single box then you're setting yourself up for a world of pain.

> Also running the whole system locally is usually a pretty good way of creating a "distributed monolith"

I'm not sure how you came to this conclusion. Nowhere does he mention monoliths or microservices. Even if that distinction is made it is good practice to be able to run your architecture on a single box as a goal.

Going through that exercise alone will force the designer to think about how the application scales up and down, having representative samples of data for test suites, coupling between services and so on.

The orders of complexity increase dramatically for each component running on a separate machine for the system under inspection.

Being charitable, the things you mention do have their place but they don't address many of the problems you'll face when building a complex system.