Hacker News new | ask | show | jobs
by ajmurmann 931 days ago
How is that the case? This example uses a distributed MySQL cluster which was of course tuned for high performance. Similarly the Rails app is distributed as well. Arguably the Rails app likely wouldn't qualify as "high performance", but it's distributed.
4 comments

It amazes me that even when we have numbers people still dismiss Rails.

Nothing will run at that scale on a single VPS. All companies will have a wide range of languages used.

If this is not Rails supporting high traffic then what do we need more?

Sorry, I love Rails, but because something can scale (which I never thought it couldn't) doesn't make it a high performance system. That's totally fine, Rails makes other tradeoffs that IMO are more universally useful, even though some people seem to not be able to understand that server cost for most companies is tiny compared to developer cost
For some reason, some people with discount any example of Rails scalability as not counting.
They're talking about "distributed" as in a system of services communicating, rather than just copies of the same monolith across multiple instances. The former adds communications and synchronisation over heads and complexities of failover for every extra service introduced
That's a totally bizarre definition. Having worked on a high-performance in-memory data grid for the last eight years, I can guarantee that you'll get all the fun distributed systems problems even with a single code base. That definition also excludes pretty much all famous distributed systems like most databases, messaging systems like Kafka and Rabbit etc.

What you seem to be getting at, isn't distributed systems, but the totally self-inflicted pain of a service oriented architecture

> Having worked on a high-performance in-memory data grid for the last eight years, I can guarantee that you'll get all the fun distributed systems problems even with a single code base.

Having spent the last 28 years building distributed network-connected systems, this comes across as wildly obtuse.

The point is that there are orders of magnitude differences in complexity when scaling a system with few communications paths and little distribution of state across process or network boundaries as there is when scaling one with many paths and state distributed in many locations. We don't tend to start talking about distributed systems when you have a tiered stack of a horizontally scalable component sandwiched between a load balancer and a database even though in a very strict technical sense already that is "distributed".

Once you start adding message queues etc., then it certainly becomes more and more reasonable to talk about a distributed system, but there is there as well a distinct grey area if dealing with e.g. queues just triggering jobs in the same code base against the same database with respect to the intent clearly expressed by the original comment.

Put another way, ignore the word "distributed", re-read the original comment, and consider that irrespective of which label you're comfortable with, what the comment is doing is drawing a distinction between two classes of systems with wildly different complexity in the distribution of responsibility and state. Where precisely you draw the line is entirely irrelevant.

> What you seem to be getting at, isn't distributed systems, but the totally self-inflicted pain of a service oriented architecture

No, it really was not. This separation between basic 2/3 tier apps and systems with a more complex data flow pre-dates the SOA buzzword literally by decades.

Maybe the distinction here would be one of which scope the respective maintainer cares about. For Shopify MySQL is mostly a black box, they don't need to re-implement their own atomic commit protocol, network partition detection etc., since MySQL did that for them. Implementors of MySQL did have to solve these distributed systems problems though and pick their CAP trade-offs, but I guess that's not the scope Shopify cares about here.
Aren't the full set of these numbers definitionally "high performance"?
Oh, I read the parent comment to thank them for confirming that "you likely don't have a problem that is hard enough to justify it". But reading it again, it could be read both ways.

Edit: To be clear, I agree that this is an example of distributed, high-performance which is why the comment made little sense to me.

Yes, if you take distributed to just mean "the same code on multiple machines". The GP above probably means "different code on different machines interacting" which brings its very own set of problems.
By that definition pretty much any problem you study in distributed systems theory, can occur in a system that doesn't fit that definition and the most well known examples of distributed systems like distributed data stores, message queues etc aren't distributed systems.