Hacker News new | ask | show | jobs
by pepesza 3639 days ago
I think that not using Erlang in this particular case was a mistake. Erlang is running some of the largest chats out there, including League of Legends and WhatsApp. They would have avoided all the hassle of GC pauses, since Erlang has per-process GC collection. And scaling single OS process to their number of connections was done for Erlang machines years ago.
9 comments

Hi there, I'm one of the original engineers who worked on our re-implementation of chat which ended up in Go.

We've a culture of being willing to try new things at Twitch. When our twisted-python chat system no longer met our needs of being easy to iterate on we decided to rebuild it; it was a monolith and we decided to chunk it up to reflect needs of our users and the pace at which we could develop new features. Notably we wanted to no recycle TCP connections whenever a new feature was added (which was a short coming of the twisted-python solution - along with a bunch of global state that was becoming hard to reason about). As part of this re-work we had a pub-sub portion which was super simple and we decided to try this new exciting language with a lot of promise out on it - it worked amazingly well. Over the course of another year or so we ended up rebuilding all of the components in Go.

When we first evaluated rebuilding chat we assessed a few options:

- python

- nodejs (we started with this, but random crashes and poor tooling at the time didn't work for us)

- erlang (notably could we use ejabberd as the hub of the system)

Ultimately we chose python because we knew python and we needed this to work right now. The move to go happened incrementally thereafter and was driven by:

- increase in trust

- great tooling

None of this can be pitched as "Go vs X", it is purely a tools and expediency orientated set of decisions.

> Notably we wanted to no recycle TCP connections whenever a new feature was added

So with the Go server, you're able to redeploy without closing open connections? Do you just run multiple versions in parallel and load balance over to the new version once connections close, or something else?

There are actually two (or more) different services. One that sits and talks to the users via TCP and maintains the IRC connection state and then makes back end calls to the bit that makes decisions and publishes information.

This allows us to almost never deploy changes to the first service, while frequently making changes to the second system. Of course when you do want to make changes to the first you have to reestablish all the TCP connections again, but if you engineer it correctly you can do it infrequently enough to be worthwhile.

Disclaimer: I don't actually work on the chat team, this is based upon various conversations with people on the chat team and may be incorrect in some specifics or out of date.

Yes, Dobbs captures this here. To be clear, the first re-write of the chat service was from twisted python into tornado python. In that re-write we produced a number of services which implemented the biz logic. One of those services was a TCP terminating edge server which has very little logic in it beside how to call the biz logic and send messages to connected clients. Once this was all written we converted to Go incrementally.
> We've a culture of being willing to try new things at Twitch

How is that different from NIH syndrome?

How about if you have a culture of being willing to try new things not invented here? That would be quite different from NIH syndrome.
being willing to try new things != trying things because they're new
Finding an erlang programmer available on site within 1-2 months is the hardest part of deciding to go with erlang. With go you can take a C++/python programmer and have them writing production code pretty soon, i think this is what inhibits functional programming in general, the learning curve bundled with the amount of work around prevents people jumping onboard also willingness of some employers to hire someone without a ton of exp. with erlang makes it difficult for a senior programmer to switch.
Truthfully, I think the supply of programmers with functional programming skills far outstrips the demand. I use FP languages on personal projects (most recently Clojure) all the time, but the day job is still programming in Java at a shop that is all Java, all the time.

Every time I suggest bringing a language like Scala or Clojure into the mix (where they would provide real benefits over Java), I always get the "And where will we find programmers to maintain the code you write?" line from management. The answer, of course, is that there are likely legions of programmers like me, who hack around with FP languages in their spare time but whose only 'professional' experience is in mainstream languages.

I suspect the real reason is that most management is just too risk-averse to consider using technology that isn't mainstream.

That's changing now, thankfully. Most people at the company I work for use Clojure every day.
Why don't you tell them that?
Two Erlang shops I know of never had problems finding Erlang devs. Competent devs will pick it up if they are interested in it even if they haven't done it full time before.

I did that to some extent.

> i think this is what inhibits functional programming in general

Yes, functional aspect was the harder part to learn. It wasn't the syntax, which is what most people mention.

The other part that is hard is to learn to use concurrency construct -- actors. But Go has the same problem, solving problems with goroutines and channels is just as much of impedance mismatched as using actors.

Actually, the hardest part of deciding to go with Erlang is not finding a programmer within 1-2 months, but deciding that one will need to train the programmer themselves (which obviously takes time; it took me half a year dabbling with Erlang and OTP (totally on my own) to actually start writing idiomatic code).
The trickiest bit about Erlang is not really FP - you can figure that out decently well in short order - it's OTP and processes and how all of that fits together, and how to do a good job with architecting everything.

Finding people should not be too hard. We were able to find a guy in Padova, Italy, who dove in and got started without much trouble as I was leaving.

>> With go you can take a C++/python programmer and have them writing production code pretty soon

In my experience, learning Go (and by that I mean fully grasping the ways of Go, goroutines, channels, selects, interfaces, type switches, etc) takes at least a year for someone whose background is C/Python/Ruby. Then may be it's just me.

I think for everyone this number is different. Depends on your background, years of experience, knowledge about computer science (not how to write in specific language) etc.

I am polyglot (i write in different langauges) and it took me couple of weeks to master Go.

Read "The Go programming language" book, it's really well written and it touches everything you need to know about Go (or most).

It totally took a year to really internalize goisms. But a week or so to pickup the basics, and some really deep code reviews and a bit of pair programming from people I respect, including the author of this article, helped me quickly learn most of the low hanging goisms and the a large portion of the standard library.

The time to go from python dev but never touched go to working on a go code base is measured in weeks in my experience.

It took me a good 3 months of fighting with the language before it started to click for me, and then another 3 months to really start using it properly. It sucks to always read things like "I was able to write production code on the first day!" around here.
I think that depends on the mindset of how you see computer languages. I like learning new languages, every time I encounter some new language, I have to try it out and get at least something basic working.

I'm pretty decent at C, C++, Python and Bash scripting, have participated in larger projects in Java, Perl, Pascal/Delphi and Ruby, and have toyed around with Rust, Haskell, Clojure, Angelscript, Crystal, Lua and probably a bunch more that I forget.

Go for me was a breeze, everything just clicked. It helps that it got a lot of it's inspiration from other languages I already knew pretty well. When started toying around with Haskell for example, this wasn't the case, it took me quite a while to get me up & running with the basics, and I still don't think I know basic Haskell. Go on the other hand was easy, and within a week I was diving into the stdlib sourcecode.

I've been programming Go for almost 2 years and I routinely get stuck trying to figure out the "right" way to do something in the language.

For reference, I felt comfortable in Java, Scala, C# and Perl all faster than Go.

Possibly in the past, yes. But if they're now just paying 1ms in GC every so often, the advantage is now gone. Go is generally faster than Erlang (in Go-native vs. Erlang-native code) so the system is quite possibly net outperforming what Erlang can do now. 1ms is just noise when packet latency jitter is higher than that.
That is not the point. The advantage of Erlang is not raw speed, but the sheer amount of language constructs helping you write distributed system without thinking too much about low-level stuff.

If I had to do some parallel data crunching, I would probably use Go or something similar. To write an actual system, it's much easier just to stand on the shoulders of Erlang guys instead of developing everything by hand(i.e. whole supervision tree).

Well, there's one less thing for Go programmers to worry about now.

The problem with being a language advocate is than when "competing" languages improve you start thinking it's bad news for your side. I try to avoid that. If Go improves it's good news for everyone, if only because we all benefit from stronger competition.

I do not feel like a language advocate. Go is great language for a number of appliances and its growing popularity shows it. I didn't have time yet to tinker with it myself, but I certainly intend to.

My point was that strength of Erlang/OTP was never in processing power/speed, but in designing the runtime so it actually solves most problems regarding distribution. Go, as far as I understand it, was created with different goal in mind - to enable fast and parallel processing. It does not make it better or worse, just different. What I'm saying is that solving garbage collection issue (and only partially, when we're at it) is not what makes in competitive in comparison to Erlang, because Erlang was designed with totally different goal in mind.

The advantage of language constructs helping you, but the disadvantage of finding devs to actually build with them.
I learnt Elixir myself, on the job. It's certainly doable. Actually, if I ever had to hire developers, I wouldn't care at all about their previous language experiences. The only thing you achieve asking for n years experience in y is discouraging talented people who worked on something else.

The thing is, you choose right tech for the job and then reserve some time to bring everybody up to speed. From my own experience it is much cheaper than trying to use already know tech in ways it was not designed to work.

> The advantage of language constructs helping you, but the disadvantage of finding devs to actually build with them.

The flip side of that coin is you're liable to get someone reinventing the wheel - poorly - in whatever language doesn't have all those goodies.

Yep, Greenspun's tenth rule comes to mind.
In the Erlang world, we call that Virding's rule:

"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang."

Edit: I'll add, though, that the Go people are pretty smart and seem like they're doing good things, so I wouldn't be too complacent in thinking Erlang is the only game in town. It still does get some things right that are hard to replicate in Go, though.

I would emphasize that, in many of those examples, Go wasn't a very viable choice when the apps were originally written. Twitch chose Go back at ~1.2 (2013), when Erlang might have made more sense.

Today, for companies making a similar decision now, that argument is a bit different. Go 1.6/1.7 obviously has massive improvements in the areas the article outlines. But, in Erlang camp, we have Elixir making that more enticing.

I would argue Twitch made the right choice. They will have a magnitude easier time finding devs to support a Go system over an Erlang system. And their product never suffered for it. And they are clearly a force behind making Go better, which has helped more people than just them.

This propagates a myth that the choice of language is the bottleneck in a complex program. Twitch hires engineers who are good enough not to be constrained by the difficulties of learning a particular language.

Odds are, Twitch wasn't trying to optimize over a long time horizon when they chose Go. They were once a scrappy startup, surely accumulating technical debt left and right to get product features out. Go was likely a locally optimal choice.

Twitch chat also requires heavy string processing, and that's an arena where, if I had to guess, Go has an edge over Erlang.

It's not really heavy string processing, it's all just replacing inner strings (not even Regex).
Sure. I think we're talking about different things when we say heavy. Perhaps less ambiguous phrasing would have been "frequent string manipulation."
There are likely other tradeoffs. This (GC pause times) is probably not the only criterium, nor even the most important. It's really hard to draw a conclusion based on such limited information.
Go 1.6 GC is probably faster thant Erlang GC now.
OK, having posted something defending Go in this thread, now let me exasperate everyone by going the other way. Because Erlang's GC is per Erlang-process whereas Go's GC is still OS-process-global, there really isn't a "faster than/slower than" comparison available, because their workloads are so dissimilar. When an Erlang GC runs, it may be running across a mere few hundred or thousand bytes, freezing only that one Erlang-process that was quite likely not running at the moment anyhow. Erlang also has the GC-time advantage that it doesn't have pointers, so there's no pointer-fixup penalty. (It may be a disadvantage at other times, but it's certainly an advantage at GC time.)
Golang's more recent async GC changes begin to resemble Erlang's per-process GC in how they would affect overall system performance.

When people talk about Go's GC freezes, they're talking about the spinup/spindown time before the async GC kicks in. That part is incomparable to Erlang, but its a part which has gotten much faster recently, specifically through virtue of becoming smaller.

> Golang's more recent async GC changes begin to resemble Erlang's per-process GC in how they would affect overall system performance.

They resemble generational GC more than anything. Generational GC has some of the advantages of Erlang (though I think the traditional HotSpot generational GC will end up working better than the one Go is going with) in the minor collections, but not in the major collections.

Go is adding to per-process heaps (there will still be a global heap).
Further complicating things are large binaries, which are handled in yet another way in Erlang. That might be an issue for some situations, and not at all a concern in others.
Do you have any benchmarks that running small GC collection ( per process ) vs one big Heap is faster?
Define "faster".

I actually have experience that looping over all Erlang processes and running a GC on each of them is definitely human-clock-time orders of magnitude slower than a Go garbage collection across a similar set of data. But who cares? First of all, that was a bit of a desperation play on my part anyhow, run for diagnostic purposes in the REPL, not an operation you do all the time, and secondly, only one process at time was frozen then anyhow, so I didn't care that it took about 10 seconds. It didn't take my service down.

Which was my point in the first place, that "faster" and "slower" don't really apply here, because what they're doing is so different from each other. There's too many different possible definitions of faster. And you have to be careful to use one that matters to your code, not just an artificial benchmark that shows your preferred choice in the better light.

(For those who may be curious, the problem that led me to that play was some now long-fixed issues with large binaries.)

Also important is the fact that short lived processes usually don't every need a GC on their heap. When they are finished they just free the full heap. For a web service this is very useful.
Stop the world vs stop one country while the rest of the world carries on.
According to the article they chose it because of "Its simplicity, safety, performance, and readability" perhaps it has more in some of those than Erlang does/did... ?
Isn't Facebook's chat also powered by Erlang?
The Erlang components of Facebook Chat were replaced with C++ several years ago.

The two main reasons I remember are A) it was hard to maintain a group of engineers with acceptable competency in Erlang over time and B) the C++ code offered faster and more consistent performance albeit with somewhat less scalability in terms of sessions per host. We just added X% more servers to the channel pools and were happy to have the chat services in a language where more FB engineers could contribute. There's been a lot more changes in the architecture than just moving to C++ though, so it's hard to do a direct comparison between the products.

This doesn't take anything away from WhatsApp though, who has built a strong product and infrastructure on top of Erlang.

It definitely was when it originally launched. After that I have no idea.
I'm glad they did. Sounds like they have helped push the development of Go along which is good for everyone.