Hacker News new | ask | show | jobs
by AndyKelley 4699 days ago
I did the benchmarks outlined at the end of the article.

Summary of results:

1. Node.js v0.10.15, single worker: 46.2 seconds

2. Node.js v0.10.15, cluster 8 workers using naught: 17.2 seconds

3. Go 1.0.2, GOMAXPROCS left default: 3.5 seconds

4. Go 1.0.2, GOMAXPROCS=8: 3.7 seconds

Detailed results below:

1. Node.js v0.10.15, single worker

  Concurrency Level:      100
  Time taken for tests:   46.217 seconds
  Complete requests:      10000
  Failed requests:        0
  Write errors:           0
  Total transferred:      10486510000 bytes
  HTML transferred:       10485760000 bytes
  Requests per second:    216.37 [#/sec] (mean)
  Time per request:       462.168 [ms] (mean)
  Time per request:       4.622 [ms] (mean, across all concurrent requests)
  Transfer rate:          221580.08 [Kbytes/sec] received
  
  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0    0   0.2      0       3
  Processing:   193  461  36.2    450     944
  Waiting:       16  235 127.3    235     534
  Total:        193  461  36.2    450     944
  
  Percentage of the requests served within a certain time (ms)
    50%    450
    66%    467
    75%    470
    80%    486
    90%    492
    95%    514
    98%    517
    99%    545
   100%    944 (longest request)
2. Node.js v0.10.15, cluster 8 workers using naught

  Concurrency Level:      100
  Time taken for tests:   17.199 seconds
  Complete requests:      10000
  Failed requests:        0
  Write errors:           0
  Total transferred:      10486510000 bytes
  HTML transferred:       10485760000 bytes
  Requests per second:    581.41 [#/sec] (mean)
  Time per request:       171.995 [ms] (mean)
  Time per request:       1.720 [ms] (mean, across all concurrent requests)
  Transfer rate:          595408.80 [Kbytes/sec] received

  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0    0   0.2      0       3
  Processing:     7  171 116.4    149     739
  Waiting:        5   96  81.9     71     710
  Total:          8  171 116.5    150     740

  Percentage of the requests served within a certain time (ms)
    50%    150
    66%    197
    75%    236
    80%    266
    90%    324
    95%    397
    98%    438
    99%    493
   100%    740 (longest request)
  
3. Go 1.0.2, GOMAXPROCS left default

  Concurrency Level:      100
  Time taken for tests:   3.542 seconds
  Complete requests:      10000
  Failed requests:        0
  Write errors:           0
  Total transferred:      10486730000 bytes
  HTML transferred:       10485760000 bytes
  Requests per second:    2823.16 [#/sec] (mean)
  Time per request:       35.421 [ms] (mean)
  Time per request:       0.354 [ms] (mean, across all concurrent requests)
  Transfer rate:          2891181.71 [Kbytes/sec] received

  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0    1   0.3      1       3
  Processing:     9   35   2.2     34      56
  Waiting:        0    1   1.3      1      22
  Total:         12   35   2.3     35      57

  Percentage of the requests served within a certain time (ms)
    50%     35
    66%     36
    75%     36
    80%     36
    90%     37
    95%     38
    98%     39
    99%     41
   100%     57 (longest request)
  
4. Go 1.0.2, GOMAXPROCS=8

  Concurrency Level:      100
  Time taken for tests:   3.657 seconds
  Complete requests:      10000
  Failed requests:        0
  Write errors:           0
  Total transferred:      10486730000 bytes
  HTML transferred:       10485760000 bytes
  Requests per second:    2734.54 [#/sec] (mean)
  Time per request:       36.569 [ms] (mean)
  Time per request:       0.366 [ms] (mean, across all concurrent requests)
  Transfer rate:          2800429.67 [Kbytes/sec] received

  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0    1   0.4      1       3
  Processing:    19   36   2.5     35      57
  Waiting:        0    1   1.1      1      16
  Total:         20   37   2.5     36      58

  Percentage of the requests served within a certain time (ms)
    50%     36
    66%     37
    75%     37
    80%     37
    90%     38
    95%     39
    98%     42
    99%     51
   100%     58 (longest request)
7 comments

There were significant performance increases in Go 1.10 (see:http://golang.org/doc/go1.1#performance), I would also suggest you benchmark a more current version.
When I see someone doing something like this and:

  1. They are not using the latest Go (atm 1.1.2)
  2. GOMAXPROCS is not = the number of CPUs
  3. They are using ab rather than something more scalable like wrk
I assume they either don't know what they are doing, or want to make Go look bad.

On a side note, Go is already known to be much faster at web-serving than node.js: http://www.techempower.com/benchmarks/#section=data-r6&hw=i7...

I did the test that the article suggested, with the versions of the tools that I have installed on my system. This is a comment on a blog article, not an attempt to engineer the perfect benchmark.

Also in this bench Go danced circles around node. I dunno what you're complaining about.

>I dunno what you're complaining about.

I'm complaining because you and other people will go on to:

  1. Use a version of Go in your development that is slower (as has much worse memory characteristics) and lacks new features.
  2. End up running all your programs on single core until you understand GOMAXPROCS
  3. Use ab to bench real things which is bad
So "my complaining" is trying to help you.
I agree with Voidlogic here. Perhaps his tone was a little confrontational, but his intentions were good. :)

    Go 1.1 > Go 1.0.2
    wrk > ab
In particular, ab should be avoided whenever possible. Apache Bench (ab) remains a single-threaded tool, meaning that for high-performance servers in particular, your exercise will run into the limits of Apache Bench before the limits of the server(s) being tested. The LigHTTP team has a multi-threaded clone named WeigHTTP that I would recommend if you want something that is functionally similar to ab and uses similar command-line arguments.

Wrk uses a slightly different argument syntax from ab and WeigHTTP but has some upsides:

1. Wrk is also multi-threaded.

2. In our experience, wrk is slightly higher-performance than WeigHTTP (~5 to 10%).

3. Wrk provides average, maximum, and stdev for latency.

4. Wrk provides a time-limited mode (rather than request-count limited), which is appealing for some test types.

In my experience, as long as you configure Go and node to use all of your cores, Go will benchmark better than Node in any permutation of these configuration variables:

    Go 1.0.2 and node tested with ab.
    Go 1.1 and node tested with ab.
    Go 1.0.2 and node tested with wrk.
    Go 1.1 and node tested with wrk.
V8 is a very fast JavaScript runtime; node.js is modestly quick at handling HTTP requests. But among the many features of Go is a high-performance HTTP package. If you've used both, it isn't all that surprising that Go's performance clocks in higher than node.
Is this the wrk you're referring to:

https://github.com/wg/wrk

Yes. Sorry for not providing the link!
> Perhaps his tone was a little confrontational

Sorry, that was not intended.

>but his intentions were good. :)

They really were...

>> wrk > ab

Can you please expand on why? I recently bumped on wrk and am in process of evaluating switch from ab, thank you

daemon13, you might also be interested in reading this thread: https://news.ycombinator.com/item?id=6114282
Sure. I just edited my message above.
Can you help me by telling my why the way I used GOMAXPROCS is wrong, and how to use it correctly?
Not so much wrong as my impression was you didn't understand it. If that is not the case I apologize.

For example you said: GOMAXPROCS left default. I don't know how you set your environment vars, are they unset so default = 1? You didn't mention in your post that GOMAXPROCS=1/single node worker test cases are really toy test cases (useful only for benchmarking). So if you know everything below, then great! Maybe other people can learn:

GOMAXPROC is the number of OS level threads that the Go runtime is multiplexing Go tasks (goroutines) over.

So if GOMAXPROCS = 1, When one goroutine blocks, another will run, BUT, you will never use more than one OS thread and thus you will never use more than one logical core.

Setting GOMAXPROCS correctly is per application. For example GOMAXPROCS=1 might be right for a commandline tool or a program that was designed to have multiple instances started on the same machine. That being said, a vast majority of the time any high load application I have written is best with GOMAXPROCS=<# of logical CPU cores>. So Go always has concurrency, but GOMAXPROCS gives it parallelism. GOMAXPROCS > 1 also will allow the garbage collector to have more parallelism too.

So if we are talking about a benchmark like this, ideally we want to process requests made in parallel in a parallel fashion. A clear sign is that if you use Node.js worker cluster you should probably test with Go at the same number.

All this being said, depending on your CPUs implementation details, you would sometimes be better off setting both your node worker count and GOMAXPROC to the number of physical rather than logical cores. Sometimes simultaneous multi-threading (SMT, aka hyperthreading) actually creates more overhead than any concurrency gains it offers.

In short when testing something like this I would always test. 1. n = 1 (with a disclaimer note) 2. n = physical CPU count 3. n = logical CPU count Where n is the number of GOMAXPROCS/node worker threads.

Okay so this is why I was confused by your comment:

I did use GOMAXPROCS with the number of logical cores that I have, and I did test the node cluster with the same number.

How much of an advantage, if any, does Node provide because it's more "mature", or at least has been used for far longer.

For example, when I go to StackOverFlow I see that Node has far more questions asked:

http://stackoverflow.com/questions/tagged/go

http://stackoverflow.com/questions/tagged/node.js

I'm really uncertain about node being judged more "mature".

Being developed by Thompson and Pike makes you gain something like 30 years of "maturity". Plus, running in production in the Google infrastructure is far more of a proof of maturity than running the chat service of every hackathon project for 2 years.

So do I just email all my questions directly to Thompson and Pike, then?

In this case, language maturity means the quantity of reference material available to help troubleshoot problems as the arise.

The golang nuts mailing list is extremely active, you'd have no issues finding answers there.
Yeah I don't doubt you'd be able to find answers to any Go question, the point though is that you'll have to work a little harder than with a "more mature" "language" like Node.js.
The difference is sending emails to some mailing list and waiting for someone to answer, vs. entering a few keywords to Google and have the top result be the answer you want because someone already asked the same question before.

This is not a stab at Go or its maturity, but rather a realistic assessment of the importance of answers being (nearly) instantly available when you're working on a project or troubleshooting a critical issue.

Agree on that, yet, if you want the big picture, you should multiply that by the probability of having something to troubleshoot, or a critical issue, in the first place.

That would give you a better estimate of the risk you're taking by using that language. And that's where the 30 years of PL design from GO's author gets some importance.

You could consider my argument a bit far fetched, but GO's design has been explicitly focused on keeping things simple and no-surprise, and so far all the reviews seem to agree on that point. That could compensate for the lack of results in Google (especially compared to what node.js forces you to do)

Mature means many things, including being able to get support, packages, existing code, solutions to common problems, etc. I'm sure Go is a solid project.
Two smart guys writing a programming language instantly gets you 30 years of maturity. Did I just wake up in some bizarro universe or something ?

And nobody cares if Go is being used for some tiny, insignificant part of the Google infrastructure. Get back to me when it is used for a stock exchange, betting site or complex web app.

What stock exchange is powered by node.js?
dl.google.com handles all the downloads for google chrome, factory images for android, eclipse.. not exactly tiny and especially not insignificant

complex web app: Cloudfare https://www.cloudflare.com/railgun BBC http://www.quora.com/Go-programming-language/Is-Google-Go-re... Soundcloud http://backstage.soundcloud.com/2012/07/go-at-soundcloud/

If this is your requirement, may I suggest:

http://stackoverflow.com/questions/tagged/java

Much more so than internet posts about it, there are countless modules for NodeJS as well. Need to upload and store and return images in Monogo using streaming? Done. Need to keep servers running despite errors? Done. Amazon S3? Done. Need an alternate cloud provider? Done. Need login using OAUTH/OpenID/whatever? Done. On and on. Most times you need some general purpose web functionality, there's already a couple Node modules in that area, if not an entire framework or sample app and tutorial targeted at that area.
By now, I'd assume the language run times are in the noise for thinking about maturity. The most likely culprits will be the application code followed by the library code. So the real question should be "Are there any big scary monsters in the libraries I need?". Followed by "Which language do I think will best enable me to hit my target?".
I did find this bit:

  // When calling .end(buffer) right away, this triggers a "hot path"
  // optimization in http.js, to avoid an extra write call.
  //
  // However, the overhead of copying a large buffer is higher than
  // the overhead of an extra write() call, so the hot path was not
  // always as hot as it could be.
  //
  // Verify that our assumptions are valid.
https://github.com/joyent/node/blob/master/benchmark/http/en...
Test nr 3 sends a whopping 2891 Mbytes/s over http ? I know it's localhost, but wow ?!
Try with GOMAXPROCS set to the number of real cores on your test machine (e.g. ignore hyperthread cores).
OK I did GOMAXPROCS=4. 3.6 seconds Node cluster count 4: 17.5 seconds
What do you think the difference is between a "real core" and a "hyperthread core"?
An HTT cpu can still only execute one instruction at a time; there are often instructions that cause the cpu to have idle time (stalled waiting for data) and hyperthreading allows for the cpu to spend that otherwise idle thread making progress on a separate task list. However, this still means that the two scheduled threads are contending for the same execution unit... The parent is suggesting that this contest may cause more of a performance degradation than the advantages that HTT provides, which would be easily resolved with some benchmarking :D
note: the above was a simplification / based on my understand of HTT cpus as of about 2008. apparently things got more complicated in the last 5 years :D The bottom line remains that HTT can cause slowdown in some cases and you should benchmark with it turned off as well.
No, not even close. Each thread on a Haswell CPU, just as an example, has 8 execution ports. Each Haswell core has ten execution units. The CPU can retire way more than one instruction per cycle.
You are absolutely correct, but you could also afford to be a bit more polite. Sentences like "In other words, you have no idea what the difference is" might be true but they're also a bit rude.
They aren't absolutely correct at all, and aaron was actually close to the money. Thrownaway is fundamentally misrepresenting (or misunderstanding) how threads -- in an operating system sense, and what we are talking to here -- relate to microcode and execution units in a core.
In theory
It's not just theory, you typically get about 3 instructions per cycle in practice.
There is a huge difference between the two. Hyperthreaded cores only give you a speed up in specific situations where additional work can be squeezed into the pipeline.

http://en.wikipedia.org/wiki/Hyper-threading

The speedup is very work dependent and in practice for things like web pages and api servers you generally only get another 20-40% of performance from them rather than a full 100%.

In other words, you have no idea what the difference is.

A hyperthreaded Intel CPU has M functional units and N decode/issue pipelines.

A non-hyperthreaded Intel CPU has M' functional units and N' decode/issue pipelines.

A hyperthreaded Intel CPU with hyperthreading disabled has M functional units and N/2 decode/issue pipelines.

I'd be humored to hear your idea of what the difference is, given the misplaced use of scare-quotes.

A hyperthread core is a virtual core -- it is not actually a core at all but is a re-purposed, possibly stalled physical core. While it can improve some scenarios, in some cases (particularly core-saturating benchmarks) it can actually hurt performance.

This is hardly an out there or controversial statement. Further I didn't say to disable hyperthreading, I said to try setting parallelism to the physical cores. Again, nothing, whatsoever, controversial about that.

It is an "out there" statement because it's entirely, radically incorrect. A processor thread represents a full-blown decode and issue pipeline. A core represents a set of execution resources. Each pipeline can dispatch to any execution unit equally. In case of contention for the same execution unit, one thread issues immediately and the other thread issues next.

If you don't disable hyperthreading, but instead run four threads on an 8-thread CPU, it is extremely likely that the threads will be scheduled on the first two cores/four threads and the other two cores will be shut down, especially on the newer intel CPUs with "turbo" features where this strategy can have large benefits.

The operating system schedules threads across cores, and the processor has zero say in the matter (further, the execution units are primarily to facilitate branch to essentially execute future scenarios). Both Linux and Windows are hyperthread aware, and will schedule threads to physical processors first, then to hyperthread virtual processors (given that it shares resources with the physical core and can sabotage performance).

This is common knowledge, and your laughable obnoxiousness, which anyone who has ever worked with multithreaded code on a HT processor knows is farce, rings pretty ridiculous.

No, the power-aware scheduler in Linux does not work as you describe. On a turbo-capable Intel CPU, if there are N program threads that will fit on M cores where M is less than the total cores on a socket, and the CPU will enter P0 state, then the threads will run on as few cores as possible and the remaining cores will be shut down.
Well it shouldn't matter that much anyway, I don't think modern kernels will put the same process threads on the same physical core.

That's the reason intel tells you to shut hyperthreading off if your operating system doesn't support it.

What tool are you using for your benchmarks?
apache benchmark. I did exactly what the article suggested to benchmark.

http://httpd.apache.org/docs/2.2/programs/ab.html

ab gets very....iffy at anything past moderate loads due it's multi-threaded design (lack there-of). I'd highly recommend wrk: https://github.com/wg/wrk

Much better at 100 reqs/sec and up.