| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by throwaway290 1345 days ago

My point 1 was specifically phrased to clarify how it could have been more reliable before migration to FB because it did not have to deal with the same load back then, you said nothing to show it was not a correlation.

Your point 2 sounds like there were additional factors that could have influence the reliability besides the load (they didn't simply migrate to FB infra but also switched to Signal).

> The broadcasting is important, yes, but it is very heavily biased towards reads over writes, so something like Cloudflare would solve 99% of the load.

There's push-notifying millions of devices within seconds after a celeb or a major news source tweets. There's tracking view and engagement stats on that in realtime. There's making sure a tweet is not available to any of those within seconds after it's been deleted or moderator. There're separate back-office apps for moderating that firehose of content. And that's just what I can see from the outside. An e2ee instant messenger with size-limited chat groups doesn't even come close.

Please don't say "just stick a CDN on top of it and you are 99% there", it's embarrassing (and not to twitter). This will maybe get you 80% there if your goal is "a microblogging platform" but not even 20% if your goal is being both the go-to news source and shitpost forum for people worldwide reliably working even in sensitive times and emergencies. Twitter used to be a microblogging platform back when it had much fewer employees and you'd see a fail whale regularly even as it had much fewer active users, in recent yeas it's a completely different beast and saying increased headcount is unrelated is amusing.

1 comments

snovv_crash 1345 days ago

WhatsApp has scaled less than 10x since acquisition. They used to handle ~3M open TCP connections per server, and as a result could run their entire operation with under 300 servers.

The push notification argument is also overstated. Sharding and fan-out solves the burstiness. And people overall receive a similar number of messages (and thus push notifications) from WhatsApp as Twitter. Besides, these days the push notifications go through Google/Apple servers anyways to reduce the number of open connections needed on the phone side.

Then there are DMs. They are per person so CDNs don't help much (just static assets), but also they shard basically perfectly. So, shard them.

Which in the end leaves the user feeds. Designed correctly, sharding would work extremely well, and what doesn't work could be handled by caching closer to users for those 1k most popular accounts.

Honestly, with the correct architecture, languages and tooling, it could be handled by an experienced 50 person dev team plus another hundred in ops. Obviously Twitter doesn't have the perfect setup, so maybe an order of magnitude more? And if you throw a bunch of subpar engineers and tooling at the problem, nothing can dig you out of inefficiencies at this scale anyways.

And no, I'm not wildly optimistic here. StackOverflow still runs off of 9 on-prem servers [0]. I've seen message queues that can give 200M notifications per second on a single machine (written in C++, for HFT). This stuff is hard yes, but throwing more bodies at it doesn't help past the point your fundamentals are solved.

0. https://www.datacenterdynamics.com/en/news/stack-overflow-st...

link

throwaway290 1344 days ago

> WhatsApp has scaled less than 10x since acquisition. They used to handle ~3M open TCP connections per server, and as a result could run their entire operation with under 300 servers.

They switched to a new protocol and grew from 200 million to I guess about a billion users since 2013. If you believe a team of 50 developers could deal with this and not cause extensive downtime and service disruption along the way I pray you never ever manage software engineers.

> Sharding and fan-out solves the burstiness.

Great, at least it's no longer "just add CDN to solve 99%" here;)

> And people overall receive a similar number of messages (and thus push notifications) from WhatsApp as Twitter.

Yeah again WhatsApp has many users but as an engineer you just don't ever have to worry about delivering a message instantly to more than 32 people (512 as of this year), and you never have to account for moderating any of that because it's e2ee and there are no adverts next to the messages. It's basically dumb pipes terminated by one native client. Twitter has to maintain a mix of automated and human review of all UGC and is accessible via extensive APIs and search engine indexed web app in addition to native client.

> Then there are DMs

Let's ignore Twitter's DMs, even without them it's far more complex and demanding than an IM app.

> StackOverflow still runs off of 9 on-prem servers [0].

Yeah, and SO maintenance page or read-only mode is up about once a month and lasts dozens of minutes. What are you even talking about now bringing up a niche programmer-oriented help forum for comparison here?

You may be stuck in the times where Twitter was a RoR-based microblogging platform. It's not been that for years.

link

snovv_crash 1343 days ago

I do manage software engineers, focusing on HPC (image processing primarily), and one thing I consistently see from people who work with 'classic' web tech is underestimating what modern hardware can do.

This isn't 2005 anymore, we have multiple parallel 40gb LAN, 64 cores per socket and 2MB of L2(!!!) cache per core, and a full terabyte of RAM (!!!) per server. If you program with anything that makes cache-aware data structures and can avoid pointer chasing, your throughput will be astounding and latency will be sub-millisecond. How else do you think WhatsApp managed 300M clients connected per server without having just the in-flight messages overflowing memory, on top of all the TCP connection state?

Things only get slow when scripting languages, serialisation, network calls and neural networks get involved. (AKA "I don't care if you want docker, a function call is 10000x faster than getting a response over gRPC and putting that in the hot loop will increase our hardware requirements by 20x.")

The more distributed your architecture the more network overhead you introduce and the more machines you need. Running the WhatsApp way with less, higher performance servers simply scales better. Just from the hardware improvements since 2013 there was no reason for WhatsApp to change their architecture as they grew.

And if you think rolling out a new protocol while maintaining backwards compatibility is hard and somehow adding more people will help, I have a team of engineers from Accenture to sell you. I did this straight out of university, to thousands of remote devices, over 2G networks, with many of the devices being offline for months in between connections. You just need a solid architecture, competent people and (I can't stress this enough) excellent testing, both automated and manual. And the team that did this was 6 engineers, and this wasn't their only responsibility.

link