Hacker News new | ask | show | jobs
by nucotano 3525 days ago
I feel concerned by this new trend of using slow languages/stacks and the "we'll fix it later" mantra. I can't believe there are webpages such as reddit with such horribly awful generation times, do their tech guys sleep well at night?
3 comments

This is not a new trend. This is a trend since the dawn of, well, as long as I've been programming, ~20 years now. Turns out that most of the time there are many more important things than your stack or language speed, and yes, often times "fix it later" is the right tradeoff.
On the other hand, there are things where "fix it later" just won't work. Especially when it comes to scalability, which has to be considered right from the start to get it right.
Yes and no. Facebook-scale technologies when you have 30 requests per minute is a horrible use of your time and money. The goal is to come up with the scale you need to fund your next scale growth. I’m working on a system that needs to handle a few thousand simultaneous users in bursts, and our growth rate looks like it might grow to a couple tens of thousand simultaneous users in bursts. We can horizontally scale for “hot” periods (like Black Friday weekend).

We’re working on designing our next iteration so we can handle tens to hundreds of thousands of constant users, and a couple of million in bursts—with the capability to horizontally scale or partition or some other mechanism to buy us other time when we need to look at the next scaling level (which will probably be some time away). It would be irresponsible of me to design for 10MM constant users now.

Considering scalability from the start does not just mean optimizing for millions of concurrent users, but choosing your software stack or your platform with scalability in mind. I get that it's important to take the next step and that premature optimization can stand in your way but there are easy to use technologies (like NoSQL, caching) und (cloud-) platforms with low overhead that let you scale with your customers and work wether you're big or small. This can be far supirior to fixing throughput and performance iteration to iteration.
I chose my software stack with scalability in mind: team scalability and rapid iteration (we started with Rails, displacing a Node.js implementation that was poorly designed and messily implemented). Because of that previous proof-of-concept implementation we needed to replace, we were forced into a design (multiple services with SSO) that complicated our development and deployment, but has given us the room to manoeuvre while we work out the next growth phase (which will be a combination of several technologies, including Elixir, Go, and more Rails).

One thing we didn’t choose up front, because it’s generally a false optimization (it looks like it will save you time, but in reality it hurts you unless you really know you need it), is anything NoSQL as a primary data store. We use Redis heavily, but only for ephemeral or reconstructable data.

The reality is, though, you have to know and learn the scalability that your system needs and you can only do that properly by growing it and not making the wrong assumptions up front, and not trying to take on more than you are ready for. (For me, my benchmark was a fairly small 100 req/sec, which was an order of magnitude or two larger than the Node.js system we replaced, and we doubled the benchmark on our first performance test. We also reimplemented everything we needed to reimplement in about six months, which was the other important thing. My team was awesome, and most of them had never used Rails before but were pleased with how much more it gave them.)

I think the main argument for (distributed) NoSQL as a primary data store is availability, but there's other ways to achieve that too.
NoSQL is not easy to use. At least not easy to use correctly in failure conditions, if your data has any complexity to it at all.
You're right, NoSQL systems tend to be more complex and especially failure scenarios are hard to comprehend. In most cases, however, this is due to being a distributed datastore where tradeoffs, administration and failure scenarios are simply much more complex. I think some NoSQL systems do an outstanding job to hide nasty details from their users.

If you compare using a distributed database to building sharding yourself for say your MySQL backed architecture, NoSQL will most certainly be the better choice.

I'll admit though dealing with NoSQL when you come from a SQL background isn't easy. Even finding the database that fit your needs is tough. We have blog post dedicated to this challenge: https://medium.baqend.com/nosql-databases-a-survey-and-decis...

> which has to be considered right from the start to get it right.

I totally disagree. Twitter did not consider getting scalability right from the start, nor did Amazon, nor did Uber. But when scalability of systems was key to scaling their businesses they found a way. Pre-optimization can kill companies, because running out of money kills businesses.

Security is another one where the "fix it later" mentality leaks in, with the resulting consequences!
It’s all about the acceptable trade-offs. There are certain things that I am not willing to compromise on security; there are other things where I’m not as concerned. We currently don’t use HTTPS inside our firewall; once you’ve passed our SSL termination, we don’t use SSL again until outbound requests happen that require SSL.

Should we? Well, it depends. There are things that I’m concerned about which would recommend it to us, but it’s not part of our current threat model because there are more important problems to solve (within security as well as without).

I don't disagree. There are always engineering trade-offs. What I have issue with is sites that do not bother to even think about security. They operate under the false sense of security that no one will bother them.
I'm surprised you would mention Reddit in this context. It seems to me they're one of the lightest websites out there, which is not really surprising seeing they're mostly serving text. Reddit and HN are pretty much the only two websites which are useable on my Raspberry Pi 2s.
What would you do if you were in charge of engineering at reddit?