Hacker News new | ask | show | jobs
by mjackson 4177 days ago
Using Go (or Scala or Java or whatever) doesn't magically scale a large system.

Scaling problems in large systems are rarely solved at the micro level, i.e. you don't "scale" by simply gaining the ability to run more operations in a single thread. This is always the problem with "language X is more scalable than Y" debates. From my experience scale has little to do with language, everything to do with how you use it.

This post brings to mind another written by Alex Payne a few years back about scaling in the large vs. scaling in the small:

https://al3x.net/2010/07/27/node.html

The danger with making hand-wavy claims with very few technical details is that it perpetuates the notion that there are certain magic bullets out there that will magically make your system "scale" if you use them. In the past few years we've seen quite a few large companies in SV and SF make the switch to some shiny new language (Scala, anyone?) only to find several million dollars and many months later that they're still having conversations about how to make stuff scale. Only now they're talking about making it happen in a language that is fairly new to their organization and with which they have only a few real experts.

6 comments

Your comment reads to me as if you think the author implied that they think Go "magically" provides scalability. I inferred nothing of the sort. Rather, I assumed they wanted to use a language which provides concurrency as a primitive, to make it easier for them to write concurrent code.

This switch (Python to Go) for this motivation (better concurrency support) seems reasonable to me, particularly since Python does not have a good concurrency story. (I love Python. But I would not choose it if I wanted high concurrency.)

Also you should consider that the more you scale, the more "scaling in the small" saves you money/machines/operators/...

Why ? Less resource-hungry code requires less resources and doesn't cause problems quite so quickly (tends to cause harder problems though). 8 years ago I switched the directory on a site that required >30 servers to run to tmpfs, and did some serious sql optimization in the php code. The month after that I turned down 20 of them (actual servers + caching and load balancing servers that weren't necessary anymore).

Everything you said is true, of course, but I think it applies more generally to inexperienced startups with first-time technical founders, or larger corporations with a "B" technical staff.

In this case, you're talking about Dropbox, which already has achieved the scale, in both the large and small senses, that require a deeper level of problem solving than simply switching languages would provide.

Even if I personally find Go an odd choice for this project (why not C++, which already has the library support they need, and much more?), their present-day successes demonstrate that they didn't expect a scalability panacea from Go, just a better runtime for a subset of their critical path code.

Nothing is magic but in this case moving from a language with a GIL to a language built for (single machine) concurrency probably scales a lot.

The OP says as much in the comments: "For us, one of the biggest latency wins comes from the fact that go can truly execute sql statements in parallel (whereas python's GIL serialized these parallelizable operations). In general, single-threaded go is at least 5x faster than pure python (without c-module)."

Go over Python may not be a magic bullet, but it's a damn useful tool nevertheless. While Go does not magically scale to datacenter-level systems, python cannot, without significant work, use even the resources on a laptop. There's a huge range of problems that are larger than 1 core and smaller than 48 cores (or 72 cores, or however many are in your largest server). And as the sibling comments have mentioned, straight-line single-thread performance is not irrelevant. Starting with a language that's slower than perl isn't a good beginning.
You're right regarding larger system architectures where it's more about the services and less about what language each is written in.

1) Regarding concurrency, Go really pushes you towards writing things in a scalable way with the goroutines and channels, while still giving you mutexes for when those are the best fit.

2) Regarding single-threaded performance, this does indeed start mattering once you un-bottleneck-yourself on the architecture and concurrency fronts. 250 boxes are cheaper than 500.

You're right, but consider this: they managed to rid themselves of Python's GIL by not using Python. If they can actually reap the benefits of parallelism, it means a lot for scalability.