Hacker News new | ask | show | jobs
by tpmx 2454 days ago
Sorta related:

I managed a team that built a 5x 1000 node distributed setup 10+ years ago.

We ended up going with

a) short DNS TTL + a custom DNS server that sent people to the closest cluster (with some intra-communication to avoid sending people to broken clusters)

b) in each cluster; three layers: 1) Linux keepalived load balancing, 2) Our custom HTTP/TLS-level loadbalancers (~20 nodes per DC), 3) our application (~1000 nodes per DC)

A typical node had 24 (4x6) CPU cores when we started and 48 (4x12) towards the end.

These were not GC/AWS nodes, we were buying hardware directly from IBM/HP/Dell/AMD/Intel/SuperMicro and flying our own people out to mount them in DCs that we hired. Intel gave us some insane rebates when they were're recovering from the AMD dominance.

Load-balancing policy: we just randomized targets, but kept sticky sessions. Nodes were stateless, except for shared app properties - we built a separate globally/dc-aware distributed key-value store - that was a whole new thing 12 years ago we built based on the vague concept of AWS Dynamo. App nodes reported for duty to the load balancers when they were healthy.

We had a static country-to-preferred-DC mapping. That worked fine at this scale.

This setup worked fine for a decade and 250M+ MAUs. We had excellent availability.

At some point like 10 years ago a kinda well known US-based board member really, really wanted to us to move to AWS. So we did the cost calculations and realized it would cost like 8X more to host the service on AWS. That shut him up.

Different times. It's so much easier now with AWS/GC to build large-scale services. But also so much more expensive - still! I wonder how long that can last until the concept of dealing with computation, network and storage really becomes a commodity.

1 comments

What in god's good name were you guys hosting that required 5,000 quad-socket physical hosts!?
A popular server-assisted mobile browser for crappy phones.

Basically one CPU second per web page. 150k pages/second @ peak. 5 million HTTP requests/s. 150 Gbit/s. The web for 250 million people.

Kinda insane numbers when I think about it now, still. (I left five years ago, after it peaked.)

Opera?
Hiptop?