Hacker News new | ask | show | jobs
by bnoordhuis 4166 days ago
Interesting results, thanks for sharing. I can perhaps shed some light on the performance differences.

> Buffer 4.259 5.006

In v0.10, buffers are sliced off from big chunks of pre-allocated memory. It makes allocating buffers a little cheaper but because each buffer maintains a back pointer to the backing memory, that memory isn't reclaimed until the last buffer is garbage collected.

Buffers in node.js v0.11 and io.js v1.x instead own their memory. It reduces peak memory (because memory is no longer allocated in big chunks) and removes a whole class of accidental memory leaks.

That said, the fact that it's sometimes slower is definitely something to look into.

> Typed-Array 4.944 11.555

Typed arrays in v0.10 are a homegrown and non-conforming implementation.

Node.js v0.11 and io.js v1.x use V8's native typed arrays, which are indeed slower at this point. I know the V8 people are working on them, it's probably just a matter of time - although more eyeballs certainly won't hurt.

> Regular Array 40.416 7.359

Full credit goes to the V8 team for that one. :-)

5 comments

I can explain what happened to Array case. 100000 used to be the threshold at which new Array(N) or arr.length = N started to return a dictionary backed array. Not anymore: this was changed by https://codereview.chromium.org/397593008 - now new Array(100001) returns fast elements array.

I will check out what happened to Buffer/TypedArray. Should not degrade that much unless something really goes south here.

Ok reporting back. There are two issues here.

The first major one is related to mortality of TypedArray's maps (aka hidden classes). When typed array stored in the Data variable is GCed and there are no other Uint8Array in the heap then its hidden class is GCed too. This also causes GC to find and discard all optimized code that is specialized for Uint8Array's and clear all type feedback related to Uint8Array's from inline caches. When we later come and reoptimize - optimizing compiler thinks that cleared type feedback means we need to emit a generic access through the IC (there is reasoning behind that) because this is potentially going to be a polymorphic access anyways. I have filed the issue[1] for the root cause (mortality of typed array's hidden class).

Now there is a second much smaller issue (which also explains performance of the Buffer case) - apparently there were some changes in the optimization thresholds and OSR heuristics. After these changes we hit OSR at a different moment: e.g. I can see that we hit inner loop one that loops over `j` instead of hitting outer loop which leads to better code. In V8 OSR is implemented in a way that tries to produce optimized code that is both suitable for OSR and as a normal function code - this is done by adding a special OSR entry block that jumps to the preheader of the selected loop we are targeting with the OSR. This allows V8 to reuse the same optimized code without optimizing it again for the normal entry - but this also leads to code quality issues if OSR does not hit the outer loop because OSR entry block inhibits code motion. This is a know problem and there are plans to fix it. The hit usually is quite small unless you have very tight nested loops (like in this case).

Disabling OSR (--nouse-osr) not only "solves" the second issue but it also partially fixes (hides)the first issue: 1) we no longer optimize with partial type feedback - so we never emit generic keyed access but always specialize it for the typed array 2) we no longer emit OSR entry - hence no code quality issues related to it.

[1] https://code.google.com/p/v8/issues/detail?id=3824

Very interesting. After reading your comment, I tried allocating another Uint8Array and keeping it allocated throughout the entire test as a workaround for the issue you mentioned. Time for Node.js was unchanged, but io.js was down to about 5.5s now. Almost the same time as Node. Only about 10% slower.

The same happens when I use the --nouse-osr parameter that you mentioned.

Is it 10% slower even if you keep array alive and apply --nouse-osr (to both node.js and io.js)?

On my machine results are fluctuating within the same ballpark (though I am on Linux and benchmarking 64-bit builds).

Ok, I hadn't tested with both before. Keeping the array alive and using --nouse-osr makes io.js only 2.3% slower than my original measurement for Node 0.10.35. Median of 5058ms.

And Node 0.10.35 shows basically the same results as before. I see less than 1% difference. Maybe just random fluctuation. Even if not. 1% is irrelevant.

I just posted a follow-up blogpost, comparing Node 0.11.15 and io.js 1.0.3 which were both released yesterday.

In that post I also benchmarked the various fixes for the typed-array slowdown you mentioned. BTW --nouse-osr makes all three tests run faster.

http://geekregator.com/2015-01-21-node_js_0_11_15_and_io_js_...

Thanks for the update.

I posted this reply on your site, but I will duplicate it here for the sake of HN readers:

> BTW --nouse-osr makes all three tests run faster.

As I tried to explain above: OSR at it is implemented now impacts code quality depending on which loop OSR hits. Which in turn depends on heuristics that V8 uses. These heuristics are slightly different in newer V8. As a result of these changes V8 hits inner loop instead of outer loop. This leads to worse code.

Code that benefits from OSR is the code that contains a loop which a) can be well optimized b) runs long b) is run only few times in total. The Sieve benchmark is opposite of this and as a result it doesn't benefit from OSR - you get bigger penalty from producing worse code and no benefit from optimizing slightly earlier.

Not using OSR for Sieve also hides the other issue with mortality of typed array's hidden classes. I say "hides" not "fixes" because one can easily construct a benchmark where the mortality would still be an observable performance issue even if benchmark itself is run without an OSR: https://gist.github.com/mraleph/2942a14ef2a480e2a7a9

Does the dramatic speed difference between the "non-conforming" implementation and V8 mean that current Node typed arrays are not memory-safe and you may get C-style buffer overflow vulnerabilities when using them?
"Non-conforming" only means they didn't completely adhere to the ES specification. There should be no possibility of buffer overflow.
FWIW I also did a test of Node:master and that performance was within 2% of what I measured for io.js.

Interesting background about typed-arrays. I didn't know that. Thanks!

> FWIW I also did a test of Node:master and that performance was within 2% of what I measured for io.js.

It would have been a good thing to include that comment to the article as well.

I thought about that. But it would have diminished the point I was trying to make: Always test with different versions as performance may differ by a LOT.
Well, if the point is that they are different then I see what you're saying but in actual fact the point seems to be they're almost equal.
He was talking about differences between point versions. 0.10 has dramatically different performance to 0.11. io.js is using 0.11, as is node:master, but older node was using 0.10.

I.e. the difference isn't necessarily node vs io, it's one point release of V8 to the next as used by node and io.

Supremely bad title then.
Yes I think it needs to be noted that V8 in node 0.10 is very very far behind when you take into account how quick the pace of development is. I would be interested to see these comparisons with bleeding edge node vs stable node.
Thanks for the details! It makes a lot of sense.

Now I wonder how node 0.11.x compares to iojs :)