Hacker News new | ask | show | jobs
by zorkian 2673 days ago
Discord infra engineer here -- this blog post needs an update! Since then we've scaled this system much more. :)

The Fortnite Official server has exceeded 100,000 concurrent users, Discord itself is way past that 5M concurrent number, we're now using Rust in certain places to make Elixir go faster, we've built a general purpose replacement to Process.monitor that scales a whole truckload more that we're open sourcing next week at Code BEAM SF... the list goes on.

There's a lot of fun stuff going on to try to make this system even more efficient and reliable, there's a lot to do still. We run everything on a very small engineering team (there are 4 fulltime engineers on the core infrastructure, only about 40 engineers in the whole company) and we're always looking for a few more. Feel free to reach out to me (zorkian#0001 on Discord) if this blog post sounds up your alley!

9 comments

For the curious, xb95 was kind enough to tell me how they call Rust from Elixir - it's via Native Implemented Functions:

http://erlang.org/doc/tutorial/nif.html

> As a NIF library is dynamically linked into the emulator process, this is the fastest way of calling C-code from Erlang (alongside port drivers). Calling NIFs requires no context switches. But it is also the least safe, because a crash in a NIF brings the emulator down too.

Sounds like a pretty good use for Rust!

Wonder if they use rustler [0], which claims it cannot crash the BEAM. Looks pretty good.

[0] https://github.com/hansihe/rustler

We do indeed. Rustler is very cool!
Sonny Scroggin of Bleacher Report gave a talk about writing NIFs in Rust (using Rustler) at last year's Code BEAM SF conference -- the video's on YouTube:

https://www.youtube.com/watch?v=F86JXaCPowo

Congrats! You say only about 40 engineers in the company... How big is the company and how big are your other departments relative to engineering?
THeir engineering team seems pretty efficient, for 40 people. I'm curious how they're organized, what they look for in hires, their release pipeline, etc.
Still reading the article, and I'm enjoying it so far.

My only complain is the design of your blog. The header and the footer (to subscribe on Medium) takes so much space that there is relatively less space to read to actual content.

https://imgur.com/a/pBUmIvp

I can recommend the browser extension "Make Medium Readable Again":

https://addons.mozilla.org/en-US/firefox/addon/make-medium-r...

This add-on can "Access your data for all websites".

That seems both risky and somewhat overkill considering its features. Does Firefox not support targeting a specific domain yet? Or is part of the problem that medium allows custom domains (it does, right?).

Firefox does support targeting specific domains, so the add-on specifically chose to apply to all domains by writing "https://*/*" in "permissions" in manifest.json. It probably asks for this permission because of custom domains, as you theorize.

I can see in the extension source (thanks to https://addons.mozilla.org/en-US/firefox/addon/crxviewer/) that on every page, the extension uses JavaScript to check for a top nav bar or a login nag popup and hide them if present, then applies CSS that hides five other UI elements if they are present.

Am I current in assuming that if the add-on was not manually installed, it could be updated at any time to include malicious code? Or is that just Chrome's behavior perhaps?
I wonder if there could be uBlock/AdBlock filter made for medium in general to block all this. I dunno the format of filter files, but it was easy to add a uBlock rule for a particular element using UI.
Isn't this the same for all Medium blogs?
You are right. :(
I hate this too but I think it's intentional - as once you sign in the banner will disappear on scroll
This makes me wonder what else they'll "invent" before it becomes completely unreadable
Also, a second question - in the conclusion, it says:

> Choosing to use and getting familiar with Erlang and Elixir has proven to be a great experience.

What background did the core infrastructure engineers have before tackling Discord in Elixir and Erlang?

How is webrtc handled at scale? Seems most webrtc server software doesn't perform that well, and discord I believe is using turn to proxy it all, so that's gotta be a lot of data flowing.
The tldr is we wrote our own highly efficient SFU - and we used our own transport layer that is not dtls, but xsalsa poly over udp.
The Fortnite Official server has exceeded 100,000 concurrent users, Discord itself is way past that 5M concurrent number, we're now using Rust in certain places to make Elixir go faster.

Would this have been avoided have you started with golang or the JVM ?

Who knows? We would have had a whole other set of issues - of which golang and JVM struggle at. There is more discussion in the rest of the comments in this post. Beam/OTP provide a fantastic foundation for building distributed soft real time systems unlike any other programming language/framework/ecosystem provides.
Not sure why you are being downvoted - it seems like a genuine (good) question. Perhaps using Go could have solved the need for Rust, perhaps not.
Do you have remote positions?
Have Discord created any blog posts with technical details (configuration details) around ways to tune various aspects of the infrastructure (from bare meta, to VM to container, to the applicaiton) to get those numbers? I've seen similar blogs in the past from Cloudflare, StackExchange and a few others. Those are always a fun read.
I hope to read about your rust return of experience. I hope it's the leaner type.