Hacker News new | ask | show | jobs
by the_local_host 1942 days ago
> Also good luck running a system that operates on few hundred of terabytes of for instance YouTube data locally.

Why can't that be scaled down to a few hundred megs of data running through a system, end-to-end, on the local machine?

Not being able to scale down seems like a code smell to me. Is there so much overhead in the microservices that even at idle, or very low usage, they can't even be started?

2 comments

For instance PostgreSQL's performance might be very different depending on the size of data query has to operate. Also if you deal with something like search results, it gets pretty annoying to deal with "well, I don't have enough data locally for this particular thing I develop".

Again, talking here about the "run the world and click on things" approach vs developing against tests / API schemas / whatnot.

Ive had pretty good results with scaling down real world data and using it to run tests against it.

Postgres performance will be different to prod, yes. That's all part of the realism - cost trade off. All models are wrong, some are useful.

The point is that diving headfirst into making matchstick model unit tests is dumb when you could build something to test against at reasonable cost which is a lot more representative of reality.

IMO this is an obvious point if you adopt a cost/benefit approach to development but it's often impossible to see for the dogmatists.

I think we're talking about slightly different things here. Running service that I need to develop right now with a lot of local data (or even read-only connection to production mega-db) is one thing. But needing to run that service with a lot of data only to be able to develop some other part of the system (that is not even related to that particular service, but "nothing works" without it) is a pretty annoying development experience.
Load / performance testing isn't always feasible in a local environment, sure. Everything else, though? It ought to be.
What if the error only occurs after ~1TB of data has been created?
Well then you have to do some bug hunting, but at least you'd have a fully working version of the system in miniature that you can test any hypotheses against.

To be honest, short of issuing every developer with their own production-grade environment to debug against I am not sure what exactly would satisfy this line of questioning.