| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AndyKelley 4699 days ago

I did the benchmarks outlined at the end of the article.

Summary of results:

1. Node.js v0.10.15, single worker: 46.2 seconds

2. Node.js v0.10.15, cluster 8 workers using naught: 17.2 seconds

3. Go 1.0.2, GOMAXPROCS left default: 3.5 seconds

4. Go 1.0.2, GOMAXPROCS=8: 3.7 seconds

Detailed results below:

1. Node.js v0.10.15, single worker

  Concurrency Level:      100
  Time taken for tests:   46.217 seconds
  Complete requests:      10000
  Failed requests:        0
  Write errors:           0
  Total transferred:      10486510000 bytes
  HTML transferred:       10485760000 bytes
  Requests per second:    216.37 [#/sec] (mean)
  Time per request:       462.168 [ms] (mean)
  Time per request:       4.622 [ms] (mean, across all concurrent requests)
  Transfer rate:          221580.08 [Kbytes/sec] received
  
  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0    0   0.2      0       3
  Processing:   193  461  36.2    450     944
  Waiting:       16  235 127.3    235     534
  Total:        193  461  36.2    450     944
  
  Percentage of the requests served within a certain time (ms)
    50%    450
    66%    467
    75%    470
    80%    486
    90%    492
    95%    514
    98%    517
    99%    545
   100%    944 (longest request)

2. Node.js v0.10.15, cluster 8 workers using naught

  Concurrency Level:      100
  Time taken for tests:   17.199 seconds
  Complete requests:      10000
  Failed requests:        0
  Write errors:           0
  Total transferred:      10486510000 bytes
  HTML transferred:       10485760000 bytes
  Requests per second:    581.41 [#/sec] (mean)
  Time per request:       171.995 [ms] (mean)
  Time per request:       1.720 [ms] (mean, across all concurrent requests)
  Transfer rate:          595408.80 [Kbytes/sec] received

  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0    0   0.2      0       3
  Processing:     7  171 116.4    149     739
  Waiting:        5   96  81.9     71     710
  Total:          8  171 116.5    150     740

  Percentage of the requests served within a certain time (ms)
    50%    150
    66%    197
    75%    236
    80%    266
    90%    324
    95%    397
    98%    438
    99%    493
   100%    740 (longest request)

3. Go 1.0.2, GOMAXPROCS left default

  Concurrency Level:      100
  Time taken for tests:   3.542 seconds
  Complete requests:      10000
  Failed requests:        0
  Write errors:           0
  Total transferred:      10486730000 bytes
  HTML transferred:       10485760000 bytes
  Requests per second:    2823.16 [#/sec] (mean)
  Time per request:       35.421 [ms] (mean)
  Time per request:       0.354 [ms] (mean, across all concurrent requests)
  Transfer rate:          2891181.71 [Kbytes/sec] received

  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0    1   0.3      1       3
  Processing:     9   35   2.2     34      56
  Waiting:        0    1   1.3      1      22
  Total:         12   35   2.3     35      57

  Percentage of the requests served within a certain time (ms)
    50%     35
    66%     36
    75%     36
    80%     36
    90%     37
    95%     38
    98%     39
    99%     41
   100%     57 (longest request)

4. Go 1.0.2, GOMAXPROCS=8

  Concurrency Level:      100
  Time taken for tests:   3.657 seconds
  Complete requests:      10000
  Failed requests:        0
  Write errors:           0
  Total transferred:      10486730000 bytes
  HTML transferred:       10485760000 bytes
  Requests per second:    2734.54 [#/sec] (mean)
  Time per request:       36.569 [ms] (mean)
  Time per request:       0.366 [ms] (mean, across all concurrent requests)
  Transfer rate:          2800429.67 [Kbytes/sec] received

  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0    1   0.4      1       3
  Processing:    19   36   2.5     35      57
  Waiting:        0    1   1.1      1      16
  Total:         20   37   2.5     36      58

  Percentage of the requests served within a certain time (ms)
    50%     36
    66%     37
    75%     37
    80%     37
    90%     38
    95%     39
    98%     42
    99%     51
   100%     58 (longest request)

7 comments

bigdubs 4699 days ago

There were significant performance increases in Go 1.10 (see:http://golang.org/doc/go1.1#performance), I would also suggest you benchmark a more current version.

link

voidlogic 4699 days ago

When I see someone doing something like this and:

  1. They are not using the latest Go (atm 1.1.2)
  2. GOMAXPROCS is not = the number of CPUs
  3. They are using ab rather than something more scalable like wrk

I assume they either don't know what they are doing, or want to make Go look bad.

On a side note, Go is already known to be much faster at web-serving than node.js: http://www.techempower.com/benchmarks/#section=data-r6&hw=i7...

link

AndyKelley 4699 days ago

I did the test that the article suggested, with the versions of the tools that I have installed on my system. This is a comment on a blog article, not an attempt to engineer the perfect benchmark.

Also in this bench Go danced circles around node. I dunno what you're complaining about.

link

voidlogic 4699 days ago

>I dunno what you're complaining about.

I'm complaining because you and other people will go on to:

  1. Use a version of Go in your development that is slower (as has much worse memory characteristics) and lacks new features.
  2. End up running all your programs on single core until you understand GOMAXPROCS
  3. Use ab to bench real things which is bad

So "my complaining" is trying to help you.

link

bhauer 4699 days ago

I agree with Voidlogic here. Perhaps his tone was a little confrontational, but his intentions were good. :)

    Go 1.1 > Go 1.0.2
    wrk > ab

In particular, ab should be avoided whenever possible. Apache Bench (ab) remains a single-threaded tool, meaning that for high-performance servers in particular, your exercise will run into the limits of Apache Bench before the limits of the server(s) being tested. The LigHTTP team has a multi-threaded clone named WeigHTTP that I would recommend if you want something that is functionally similar to ab and uses similar command-line arguments.

Wrk uses a slightly different argument syntax from ab and WeigHTTP but has some upsides:

1. Wrk is also multi-threaded.

2. In our experience, wrk is slightly higher-performance than WeigHTTP (~5 to 10%).

3. Wrk provides average, maximum, and stdev for latency.

4. Wrk provides a time-limited mode (rather than request-count limited), which is appealing for some test types.

In my experience, as long as you configure Go and node to use all of your cores, Go will benchmark better than Node in any permutation of these configuration variables:

    Go 1.0.2 and node tested with ab.
    Go 1.1 and node tested with ab.
    Go 1.0.2 and node tested with wrk.
    Go 1.1 and node tested with wrk.

V8 is a very fast JavaScript runtime; node.js is modestly quick at handling HTTP requests. But among the many features of Go is a high-performance HTTP package. If you've used both, it isn't all that surprising that Go's performance clocks in higher than node.

link

SkyMarshal 4699 days ago

Is this the wrk you're referring to:

https://github.com/wg/wrk

link

bhauer 4699 days ago

Yes. Sorry for not providing the link!

link

voidlogic 4699 days ago

> Perhaps his tone was a little confrontational

Sorry, that was not intended.

>but his intentions were good. :)

They really were...

link

daemon13 4699 days ago

>> wrk > ab

Can you please expand on why? I recently bumped on wrk and am in process of evaluating switch from ab, thank you

link

voidlogic 4699 days ago

daemon13, you might also be interested in reading this thread: https://news.ycombinator.com/item?id=6114282

link

bhauer 4699 days ago

Sure. I just edited my message above.

link

AndyKelley 4699 days ago

Can you help me by telling my why the way I used GOMAXPROCS is wrong, and how to use it correctly?

link

voidlogic 4699 days ago

Not so much wrong as my impression was you didn't understand it. If that is not the case I apologize.

For example you said: GOMAXPROCS left default. I don't know how you set your environment vars, are they unset so default = 1? You didn't mention in your post that GOMAXPROCS=1/single node worker test cases are really toy test cases (useful only for benchmarking). So if you know everything below, then great! Maybe other people can learn:

GOMAXPROC is the number of OS level threads that the Go runtime is multiplexing Go tasks (goroutines) over.

So if GOMAXPROCS = 1, When one goroutine blocks, another will run, BUT, you will never use more than one OS thread and thus you will never use more than one logical core.

Setting GOMAXPROCS correctly is per application. For example GOMAXPROCS=1 might be right for a commandline tool or a program that was designed to have multiple instances started on the same machine. That being said, a vast majority of the time any high load application I have written is best with GOMAXPROCS=<# of logical CPU cores>. So Go always has concurrency, but GOMAXPROCS gives it parallelism. GOMAXPROCS > 1 also will allow the garbage collector to have more parallelism too.

So if we are talking about a benchmark like this, ideally we want to process requests made in parallel in a parallel fashion. A clear sign is that if you use Node.js worker cluster you should probably test with Go at the same number.

All this being said, depending on your CPUs implementation details, you would sometimes be better off setting both your node worker count and GOMAXPROC to the number of physical rather than logical cores. Sometimes simultaneous multi-threading (SMT, aka hyperthreading) actually creates more overhead than any concurrency gains it offers.

In short when testing something like this I would always test. 1. n = 1 (with a disclaimer note) 2. n = physical CPU count 3. n = logical CPU count Where n is the number of GOMAXPROCS/node worker threads.

link

AndyKelley 4699 days ago

Okay so this is why I was confused by your comment:

I did use GOMAXPROCS with the number of logical cores that I have, and I did test the node cluster with the same number.

link

melling 4699 days ago

How much of an advantage, if any, does Node provide because it's more "mature", or at least has been used for far longer.

For example, when I go to StackOverFlow I see that Node has far more questions asked:

http://stackoverflow.com/questions/tagged/go

http://stackoverflow.com/questions/tagged/node.js

link

bsaul 4699 days ago

I'm really uncertain about node being judged more "mature".

Being developed by Thompson and Pike makes you gain something like 30 years of "maturity". Plus, running in production in the Google infrastructure is far more of a proof of maturity than running the chat service of every hackathon project for 2 years.

link

diminoten 4699 days ago

So do I just email all my questions directly to Thompson and Pike, then?

In this case, language maturity means the quantity of reference material available to help troubleshoot problems as the arise.

link

ciclista 4699 days ago

The golang nuts mailing list is extremely active, you'd have no issues finding answers there.

link

diminoten 4699 days ago

Yeah I don't doubt you'd be able to find answers to any Go question, the point though is that you'll have to work a little harder than with a "more mature" "language" like Node.js.

link

enraged_camel 4699 days ago

The difference is sending emails to some mailing list and waiting for someone to answer, vs. entering a few keywords to Google and have the top result be the answer you want because someone already asked the same question before.

This is not a stab at Go or its maturity, but rather a realistic assessment of the importance of answers being (nearly) instantly available when you're working on a project or troubleshooting a critical issue.

link

bsaul 4699 days ago

Agree on that, yet, if you want the big picture, you should multiply that by the probability of having something to troubleshoot, or a critical issue, in the first place.

That would give you a better estimate of the risk you're taking by using that language. And that's where the 30 years of PL design from GO's author gets some importance.

You could consider my argument a bit far fetched, but GO's design has been explicitly focused on keeping things simple and no-surprise, and so far all the reviews seem to agree on that point. That could compensate for the lack of results in Google (especially compared to what node.js forces you to do)

link

melling 4699 days ago

Mature means many things, including being able to get support, packages, existing code, solutions to common problems, etc. I'm sure Go is a solid project.

link

threeseed 4699 days ago

Two smart guys writing a programming language instantly gets you 30 years of maturity. Did I just wake up in some bizarro universe or something ?

And nobody cares if Go is being used for some tiny, insignificant part of the Google infrastructure. Get back to me when it is used for a stock exchange, betting site or complex web app.

link

oijaf888 4698 days ago

What stock exchange is powered by node.js?

link

BitMastro 4698 days ago

dl.google.com handles all the downloads for google chrome, factory images for android, eclipse.. not exactly tiny and especially not insignificant

complex web app: Cloudfare https://www.cloudflare.com/railgun BBC http://www.quora.com/Go-programming-language/Is-Google-Go-re... Soundcloud http://backstage.soundcloud.com/2012/07/go-at-soundcloud/

link

trimbo 4699 days ago

If this is your requirement, may I suggest:

http://stackoverflow.com/questions/tagged/java

link

lnanek2 4699 days ago

Much more so than internet posts about it, there are countless modules for NodeJS as well. Need to upload and store and return images in Monogo using streaming? Done. Need to keep servers running despite errors? Done. Amazon S3? Done. Need an alternate cloud provider? Done. Need login using OAUTH/OpenID/whatever? Done. On and on. Most times you need some general purpose web functionality, there's already a couple Node modules in that area, if not an entire framework or sample app and tutorial targeted at that area.

link

flogic 4699 days ago

By now, I'd assume the language run times are in the noise for thinking about maturity. The most likely culprits will be the application code followed by the library code. So the real question should be "Are there any big scary monsters in the libraries I need?". Followed by "Which language do I think will best enable me to hit my target?".

link

razorsese 4699 days ago

Why is this faster?

https://gist.github.com/jdpaton/9f20ff0b13e0cc20017a

link

arh68 4699 days ago

I did find this bit:

  // When calling .end(buffer) right away, this triggers a "hot path"
  // optimization in http.js, to avoid an extra write call.
  //
  // However, the overhead of copying a large buffer is higher than
  // the overhead of an extra write() call, so the hot path was not
  // always as hot as it could be.
  //
  // Verify that our assumptions are valid.

https://github.com/joyent/node/blob/master/benchmark/http/en...

link

toong 4699 days ago

Test nr 3 sends a whopping 2891 Mbytes/s over http ? I know it's localhost, but wow ?!

link

corresation 4699 days ago

Try with GOMAXPROCS set to the number of real cores on your test machine (e.g. ignore hyperthread cores).

link

AndyKelley 4699 days ago

OK I did GOMAXPROCS=4. 3.6 seconds Node cluster count 4: 17.5 seconds

link

thrownaway2424 4699 days ago

What do you think the difference is between a "real core" and a "hyperthread core"?

link

aaronblohowiak 4699 days ago

An HTT cpu can still only execute one instruction at a time; there are often instructions that cause the cpu to have idle time (stalled waiting for data) and hyperthreading allows for the cpu to spend that otherwise idle thread making progress on a separate task list. However, this still means that the two scheduled threads are contending for the same execution unit... The parent is suggesting that this contest may cause more of a performance degradation than the advantages that HTT provides, which would be easily resolved with some benchmarking :D

link

aaronblohowiak 4699 days ago

note: the above was a simplification / based on my understand of HTT cpus as of about 2008. apparently things got more complicated in the last 5 years :D The bottom line remains that HTT can cause slowdown in some cases and you should benchmark with it turned off as well.

link

thrownaway2424 4699 days ago

No, not even close. Each thread on a Haswell CPU, just as an example, has 8 execution ports. Each Haswell core has ten execution units. The CPU can retire way more than one instruction per cycle.

link

chad_oliver 4699 days ago

You are absolutely correct, but you could also afford to be a bit more polite. Sentences like "In other words, you have no idea what the difference is" might be true but they're also a bit rude.

link

corresation 4698 days ago

They aren't absolutely correct at all, and aaron was actually close to the money. Thrownaway is fundamentally misrepresenting (or misunderstanding) how threads -- in an operating system sense, and what we are talking to here -- relate to microcode and execution units in a core.

link

AsymetricCom 4699 days ago

In theory

link

oofabz 4699 days ago

It's not just theory, you typically get about 3 instructions per cycle in practice.

link

spullara 4699 days ago

There is a huge difference between the two. Hyperthreaded cores only give you a speed up in specific situations where additional work can be squeezed into the pipeline.

http://en.wikipedia.org/wiki/Hyper-threading

The speedup is very work dependent and in practice for things like web pages and api servers you generally only get another 20-40% of performance from them rather than a full 100%.

link

thrownaway2424 4699 days ago

In other words, you have no idea what the difference is.

A hyperthreaded Intel CPU has M functional units and N decode/issue pipelines.

A non-hyperthreaded Intel CPU has M' functional units and N' decode/issue pipelines.

A hyperthreaded Intel CPU with hyperthreading disabled has M functional units and N/2 decode/issue pipelines.

link

corresation 4699 days ago

I'd be humored to hear your idea of what the difference is, given the misplaced use of scare-quotes.

A hyperthread core is a virtual core -- it is not actually a core at all but is a re-purposed, possibly stalled physical core. While it can improve some scenarios, in some cases (particularly core-saturating benchmarks) it can actually hurt performance.

This is hardly an out there or controversial statement. Further I didn't say to disable hyperthreading, I said to try setting parallelism to the physical cores. Again, nothing, whatsoever, controversial about that.

link

thrownaway2424 4699 days ago

It is an "out there" statement because it's entirely, radically incorrect. A processor thread represents a full-blown decode and issue pipeline. A core represents a set of execution resources. Each pipeline can dispatch to any execution unit equally. In case of contention for the same execution unit, one thread issues immediately and the other thread issues next.

If you don't disable hyperthreading, but instead run four threads on an 8-thread CPU, it is extremely likely that the threads will be scheduled on the first two cores/four threads and the other two cores will be shut down, especially on the newer intel CPUs with "turbo" features where this strategy can have large benefits.

link

corresation 4699 days ago

The operating system schedules threads across cores, and the processor has zero say in the matter (further, the execution units are primarily to facilitate branch to essentially execute future scenarios). Both Linux and Windows are hyperthread aware, and will schedule threads to physical processors first, then to hyperthread virtual processors (given that it shares resources with the physical core and can sabotage performance).

This is common knowledge, and your laughable obnoxiousness, which anyone who has ever worked with multithreaded code on a HT processor knows is farce, rings pretty ridiculous.

link

thrownaway2424 4698 days ago

No, the power-aware scheduler in Linux does not work as you describe. On a turbo-capable Intel CPU, if there are N program threads that will fit on M cores where M is less than the total cores on a socket, and the CPU will enter P0 state, then the threads will run on as few cores as possible and the remaining cores will be shut down.

link

GhotiFish 4699 days ago

Well it shouldn't matter that much anyway, I don't think modern kernels will put the same process threads on the same physical core.

That's the reason intel tells you to shut hyperthreading off if your operating system doesn't support it.

link

hackerboos 4699 days ago

What tool are you using for your benchmarks?

link

AndyKelley 4699 days ago

apache benchmark. I did exactly what the article suggested to benchmark.

http://httpd.apache.org/docs/2.2/programs/ab.html

link

TylerE 4699 days ago

ab gets very....iffy at anything past moderate loads due it's multi-threaded design (lack there-of). I'd highly recommend wrk: https://github.com/wg/wrk

Much better at 100 reqs/sec and up.

link