Hacker News new | ask | show | jobs
Can Nginx servers scale to the level of Google
11 points by donboscow 1240 days ago
Google uses millions of servers around the world to distribute their load in an even manner, and the servers are custom Google servers instead of standard ones like Apache or Nginx. Average Google load is around 40,000 queries per second, translating to 3.7 billion searches per day and 1.2 trillion queries per year. This of course also includes peak load moments like election results being announced somewhere or some game scores being updated. Is Nginx, which is open sourced and freely available, capable of handling such load (of course with right hardware wired to it)? Or is there something inherently lacking in Nginx or Apache or other servers that Google had to settle for a custom home-made server to handle its traffic? Can a series of Nginx servers distributed around the world at opportune places provide similar level of performance for a website of traffic at the level of Google?
6 comments

Can you link sources for those Google numbers? I am asking because I have seen smaller platforms, but still with millions of users, and the avg. RPS was around 10-15k.

Nginx is quite scalable with the right configuration and hardware, but most of the times you want to use more instances of it to scale to this kind of traffic.

Modern, Envoy based proxies are quite popular, and can scale better with better performance, mostly due to the fact it’s written in C++.

Cloudflare used NGINX up until recently and that would be Google scale IMO
Good to know. Any idea why Google chose to use custom servers instead of opting for Nginx or Apache? Specially when Nginx offers excellent load balancing capabilities in-built?
Because their own can solve their problems better. Note that the Cloudflare reference is "used to" - just because you can use nginx at scale doesn't mean it's the best, and all such deployments aren't running stock nginx, and not nginx alone.
Got it. Thanks. Which servers are best suited to handle such high levels of load, and are available for free and open usage like Nginx or Apache is? I read that Apache has even more problems than Nginx.
Nothing wrong with NGINX. Read Cloudflare post on Pingora.

Different needs for different usecases where NGINX is more of an OOTB situation. NGINX is great. I see Caddy mentioned quite a bit, believe it’s written in Go where NGINX is written in C. Cloudflare wrote Pingora in Rust IIRC and I want to say they are releasing an open source flavor

Hmm I read the post. Cloudfare handles almost 1 trillion requests per day, Google does that per year. However, the request types are different, Cloudfare handles a variety of requests like caching and forwarding, while Google requests are about serving complete web pages. May be for the webserver purpose like Google Nginx may suffice?
TBH, looks like Nginx has bunch of issues: https://www.youtube.com/watch?v=QbOAHkaFU6w&ab_channel=Husse...
Probably yes.

You can scale horizontally with a lot of web servers and load balance them. You don't need to stretch load balancer clusters across AZs/regions, so I see no reason why Nginx wouldn't work.

I assume at that scale session cookies are shared without application server clustering. A distributed KV store is probably enough.

Hi, I thought so too. But then why did it not work for Cloudflare, and why did their custom solution work with Pingora?
As it turns out, they didn't like that each requests lands on a single worker and can only reuse connections on that worker, which gets worse as the number of workers per server increases. Presumably, scaling horizontally wasn't cost efficient enough.

They also wanted something written in a memory-safe language so it could be extended without easily falling into memory safety issues. NGINX is written in C.

Pornhub uses nginx and it's a top 5 website in traffic.

Source: https://davidwalsh.name/pornhub-interview

Nginx is not going to be your bottleneck. If it is, you don't need to be asking on here. :)
We don’t all work directly with megascale systems. It can still be interesting to learn about how those systems work, and what considerations need to be made in those circumstances.
What do Google (or Facebook) actually use at their scale?