Hacker News new | ask | show | jobs
by SomeCallMeTim 3698 days ago
I was shocked that 12 requests/second could take down any site.

I use async logic (previously OpenResty, more recently NodeJS and Go) and largely pregenerated sites, so 2500 requests/second is a minimum baseline -- on a much lower end instance than an m4.xlarge.

There's a reason I don't use PHP (or any primarily synchronous language like Ruby) any more.

8 comments

The language used is meaningless absent the context of the whole application, especially the database. I can do 2500 requests a second with ruby or php on a micro instance on AWS. But that's meaningless after I plug into a mysql database that's going to bottleneck at 2 requests a second after i try to render abad wordpress theme out of MySql.
That 2500 is after taking into account relevant database queries. With async servers (which, as far as I know, PHP can't be, at least with the way it typically integrates into Apache) you can accept all the requests, forward off the database requests each depends on, and still send back the pages to everyone who requested one.

Of course I've avoided WordPress and similar frameworks for years for a reason as well; I've seen front-end frameworks that require dozens if not hundreds of database queries to render a page, which is such bad design that it makes my brain hurt.

node.js does have it's advantages, but it's possible to be very fast with php.

For example, the advantage of the "warm app" with node is often approximated by using a shared memory kv store in php.

And, while an async io approach scales in a simpler way, you can typically find an optimum tuning for fastcgi that scales very well.

Looking at this benchmark: https://www.techempower.com/benchmarks/#section=data-r12&hw=...

There are php-fpm implementations running at the same clip as node, and an hhvm implementation trouncing it.

Yes, benchmarks are sometimes bullshit, but the idea that node's approach is somehow light years ahead just doesn't pan out in the real world.

>benchmarks are sometimes bullshit, but the idea that node's approach is somehow light years ahead just doesn't pan out in the real world.

It's in the real world where you have database connections that can't always scale to 186k/second with low latency on all responses, which is what's required to get the performance out of HHVM in the linked benchmark. In a typical architecture you may not have the database local to the PHP server, meaning latency will be much higher. And it's upstream latency that kills a synchronous connection.

And it's in the real world where a number of third party queries may be involved with a request, and those queries may take 100ms to resolve, during which time your thread is blocked in PHP whereas Node can be busy handling other requests or even other aspects of the same request (a Promise.all of a half dozen simultaneous database queries, for instance).

Async approaches are light years ahead in the real world. I keep seeing references to HHVM when people try to defend PHP, so I did a Google search to find out if it supports async, and the answer is "a little bit." [1] Basically it looks like, within a single request, it can execute several queries in parallel, like my Promise.all example above. So it looks like Hack, at least, has that feature. But as far as I can tell it doesn't mix multiple connections in a single thread. And you have to be careful to use only async-aware operations or you lose the benefits; Node is designed around async behavior, so everything you're likely to use supports it by default.

Those benchmarks are in fact unrealistic. And the Node implementation in that benchmark uses a single connection to MySQL, so the 20 queries are actually executed in series instead of in parallel. [0] If they used a connection pool instead, they could all execute in parallel. Look at the numbers:

Queries 1 5 10 15 20 nodejs 85,490 22,917 12,083 8,250 6,254 hhvm 12,369 12,428 10,056 10,394 9,322

The NodeJS results shouldn't drop that fast off of the single query unless they're all using the same MySQL connection; even using just 4 connections from a pool should speed it up by a lot. Also, the HHVM source uses stored procedures while the Node version recompiles the query every time. Finally, Node can be faster when driven from Nginx, while it's using the internal Node server instead.

Here's an article I just stumbled across that talks about the limits of HHVM to accelerate PHP: [2] It touches on some of the same points. Async is the important way to speed up real world apps, and it's not the default "PHP Way."

Aside from that, probably 90% of the people using PHP are using it "normal" PHP on a hosted server in Apache and not HHVM at all. So you're basically arguing hypotheticals: "IF they use HHVM, and IF they write the code exactly right, and IF their upstream servers have really low latency all the time, then PHP isn't much slower than Node."

Node is usually faster in the real world given common coding patterns. And Node gives you Socket.IO, which pretty much kills the relevance of PHP no matter how you slice it. Even long polling would slaughter a PHP server; you'd be able to support at most one concurrent user per thread. Async servers are a good at supporting long running connections.

[0] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

[1] https://docs.hhvm.com/hack/async/introduction

[2] http://www.infoworld.com/article/2948132/app-servers/hhvm-38...

From what I remember of symfony (it's been years) their orm had some nasty memory leaks.
If the request is performing heavy calculations you will see fever req/sec obviously.

Say an API call spins up a Linux VM and makes it available some user. Or a bulk upload of data which needs to be indexed. Or whatever.

The idea that a site should be able to handle X requests/sec because the stack can handle X NOOPs per second is odd.

Not talking NOOPs. I'm talking multiple full round trips to the database per query.

Node can render more NOOPs per second than that; I've heard of a well tuned Node server hitting 100k. But because of the async nature of the handling of responses, you don't need dozens or hundreds of threads to handle thousands of clients, and it's the threads that kill you.

Unless you've just got an awful architecture, in which case that will kill you first.

I still don't understand what point you are trying to make. Surely you must agree that the number of requests per second your server will depends on what the server is doing? If your operation calls a DB or calculates something and that is heavy, then your db or internal calculation service might run out of resources. I'm not sorry, but your lack of experience with large scale systems is visible.
Most web sites are just grabbing data from databases. If you use an async front-end, it's the database servers that need to scale, because they're doing the heavy lifting. But scaling your database infrastructure can be orthogonal to scaling the application layer; in Node, except for under extreme load, you may never NEED to scale the application layer. And for really high load situations, you may need 25-50x fewer Node servers than PHP.

It's the poor design of most systems that cause them to not scale; there are certainly exceptions where the server needs to do a lot of complex calculations, but those calculations can most of the time be handled in microservices and, again, scaled orthogonally to the application layer.

PornHub runs on a PHP stack. Do you think that maybe they get a little more than 12 requests a second?
I wouldn't blame that on PHP. You can make PHP sites with reasonable performance.
A well configured small / medium instance should easily handle 100 requests/second. My test PHP setup on micro instances serves 1000 requests/second before any signs of slowdown. [1]

[1] https://nestify.io/wp-content/uploads/2015/10/loader.io_.png

Every application is different and has different requirements. I know absolutely knowing about the original author's application so I can't comment on any specifics. But just because your "test PHP setup" can handle X amount of r/s does not suggest that any other application should be able to do the same.
A better test would be to disable any cache plugins in your wordpress and then run tests against your site.
Put a database behind it with complicated enough requests, and you'll run out of connections to the db regardless if you have async io. You'll have to add a connection pooler or multiple master/slave dbs - and even then you will run into problems with enough traffic. Not everything is sending back some json from something that is readily available.
To really scale a complex architecture, certainly there are other things you need to do, like setting up database replication and/or sharding, caching, and other scaling strategies.

That can typically be done from Node by simply adding the "use pooling" option in your database library (or sometimes by switching to the NPM for that database that enables pooling), and when you have additional slave databases, adding those to the database init call.

As far as Node is concerned, it really is that easy. Scaling PHP, though, pretty much requires that you add more threads, which means (after a point) you need a lot of RAM, or just adding more instances and load balancing between them.

Node won't prevent database scaling issues, but it will keep the part that it does handle a lot easier to maintain.

But can you do pregenerated sites all the time? Like in my case I have a search page, and some dynamic things, I should generate a huge amount of pages (but while writing this I'm thinking that it could be feasible though a bit complicated to regenerate in numerous situations)
I bet you I can write a website in NodeJS or Go that will fail with fewer than 1 request / second. Heck, I bet you I can make a website that will fail even if it only receives one request in its lifetime!

The language usually isn't the reason for issues like this...

Strawman.

For tasks that are primarily IO bound, async architectures can scale more than synchronous languages. Period.

It would take intentionally (or newb/cluelessly) bad design to end up with a Node server that DOSed at 12 connections per second.

In PHP, if you run 4 threads, it just takes a backend with a 333ms latency (on all queries performed serially) to limit you to 12 connections per second. If you only run one thread, you just need a backend with a cumulative 83ms latency to get DOSed at 12 connections per second. In a more realistic scenario, a typical crappy framework will result in dozens of queries for a single page, but it comes down to the same thing.

In Node, you can run one thread with a cumulative 333ms backend latency and still handle thousands of connections per second. They'll each just end up waiting a bit more than 333ms for their results, assuming the database itself isn't DOSed (which takes a surprisingly high load -- way above the levels we're talking). Actually, depending on how interconnected the backend queries are, Node may actually result in less than a total 333ms latency, because many of those queries may also be parallelized by the browser, and will then be handled in parallel by the server (and much of the latency may actually be in http negotiations and/or establishing a database connection, honestly).

>>It would take intentionally (or newb/cluelessly) bad design to end up with a Node server that DOSed at 12 connections per second.

Single threaded makes for it's own pitfalls. I assume you can imagine some cpu bound tasks that would have node.js at 12 connections/sec or less.

>I assume you can imagine some cpu bound tasks that would have node.js at 12 connections/sec or less.

Certainly. That would fall under "newb/clueless" design, though. Anyone who would throw a CPU-bound task into a primary Node server shouldn't be allowed near architectural design decisions. Whereas code in PHP written using best practices can easily end up with a server that can barely hit 100 queries per second.

Imagine, for instance, a situation where the client needs to do 50 requests to the server to render a page [1][2], and each query ends up with 20ms of latency on the PHP side; assuming you're running 8 threads (and the client makes 8 concurrent requests), a single page query could block your server for 125ms. A slow client or network might even block your PHP threads for longer. Node could crunch through ten thousand requests like that per second when running on four CPUs, meaning 125 of these bloated pages rendered per second, compared to ... 8 or less.

Even with decent client pages that can render with ONE server query, a couple dozen database lookups are par for the course, sometimes including an authentication lookup on a third-party OAUTH server. That could be 125ms all by itself, and in PHP your thread is blocked while the lookup happens. With the async model, once the query is off, the server is doing work on other requests until the query has returned data.

Many CPU-bound tasks like "convert an image" are already coded in Node to happen in the background, triggering a callback when they're done so you can then send the result to the client. And in Node it's absolutely trivial to offload any likely CPU-bound task to a microservice, where the NodeJS server just queries the microservice and waits for the result. Which you'd want to do, of course, if a task is CPU-bound, because you would want a faster server than V8 running it anyway. Go would be a likely candidate, and Go handles threading either through light/async threads or via actual threading, as necessary. It's quite awesome.

And if you really can't trust your developer to write code without extensive time-consuming calculations, then make them use Elixir or Erlang. It will use preemptive multitasking at the VM level if a thread takes up too much time, and even if they foolishly write a task that takes hundreds of milliseconds to complete, it will still task swap and serve other clients.

But arguing that pathologically bad code in Node can make it perform as badly as PHP does all the time isn't exactly a ringing endorsement for the language.

[1] In 2014 the average number of objects a web page requested was 112, and seemed to continue to be going up, though I'm assuming a lot of those are static resources and third party requests, like for analytics and ads. http://www.websiteoptimization.com/speed/tweak/average-web-p... I've personally seen pages with 70-80 requests against a PHP backend to render one page.

[2] And I wouldn't call a client page needing 50 requests a best practice, but I'm assuming that we're talking about the server side here, and that we are being forced to deal with an existing client that behaves that way. So call it "best practices on the server."