Hacker News new | ask | show | jobs
by mraleph 5085 days ago
> This is most significant in V8 which does not use NaNboxing, so when it sees a big integer it makes it a boxed double

NaN-tagging has nothing to do with the issue you are linking to e.g. in V8 does not allocate a boxed double for every floating point value it reads from Float64Array.

Let me expand my own comments [vegorov@chromium.org] from the issue, so that the problem V8 has becomes more understandable.

The reason for the Issue 2097 is somewhat "patchy" support V8 has for uint32 values in the optimized code. uint32 value in JavaScript has somewhat a dual nature: it can be treated bitwise as an signed int32 as long as it participates in truncating operations[1]; but once it escapes into unsafe place where its "unsigned-ness" can be observed it might have to become a full double if it does not fit into positive part of int32 range.

V8's optimizing pipeline does not have a proper analysis to determine where uint32 value can efficiently be represented as int32 without affecting semantics. It also does not want to pessimistically represent all values read from Uint32Array as doubles so it just chooses to assume non-truncating int32 representation for them and as a result code deoptimizes when it sees a value that does not fit [another bit that is missing here is type-feedback on keyed load IC that would tell hydrogen that he is too optimistic in its assumption].

V8 actually has the same problem with x >>> 0.

[1] - examples of truncating operations are: bitwise ops, stores into integer typed arrays and arithmetic operations when they do not overflow beyond 53bits and their results are used only in truncating operations;

1 comments

Interesting, thanks. I am not sure I fully follow, though: Why is v8 so much slower on this than other engines that also do not have special code to handle uint32s? We are talking slowdown factors of 10x-15x, so it isn't a minor optimization here or there - what is the cause?
Optimized code can't deal with uint32 that is out of range so it deoptimizes and effectively you end up running in an entirely unoptimized code which is obviously much slower than optimized one... and of course this penalizes V8 even more because unboxed doubles and int32s are possible for V8 only in optimized code.

It is true that if V8 were using NaN-tagging it would suffer less from running entirely unoptimized code. But my point is: it doesn't have to use NaN-tagging to run this code efficiently, it just needs to ensure it doesn't deoptimize for nothing.

Ok, I see.

So, out of the 10-15x slowdown, how much is due to deoptimizing and how much is due to not NaNboxing, if it's possible to estimate that? The distinction should be important in the case of a very large codebase whose performance is not focused in a few small loops, so presumably most of the time you will be running unoptimized code (and then if you NaNbox or not gets important).

I am not sure I entirely understand. If you are running in a cold code then performance does not matter and you can tolerate quickly allocating a small amount of boxes which will be as quickly reclaimed by scavenger once you are done with them. If you are running in an hot code --- then it should be optimized in a way that minimizes the number of boxes produced.

In other words: ideally application should be running unoptimized code if and only if it is either cold or cannot be improved by optimization; all other cases are bugs.

I can't split 10-15x between deoptimization and boxing because for V8 cost of "erroneous" deoptimization includes the cost of boxing as you can't have unboxed numbers in unoptimized code.

As I said earlier it is true that non-optimized code heavily manipulating doubles could become faster if V8 used NaN-tagging (or another technique that would allow it to maintain unboxed doubles on unoptimized frames). But speed of unoptimized code should not matter (see above).

Another thing to keep in mind is that for NaN-tagging on ia32 you pay with memory overhead: every object slot that can contain primitive number becomes twice as large on ia32. This is not nice if you don't have a lot of number floating around.

Overall, let me reiterate it, I am not arguing against NaN-tagging. I am just clarifying that the Issue 2097 is caused by the wrong decision in the hydrogen pipeline not by the fact that V8 does not use NaN-tagging.

I see now what you are saying about that issue, NaNboxing makes it worse but at core it is a deoptimization issue. Which is good, I hope this is fixed soon (so emscripten-compiled code runs more consistently across browsers).

> I am not sure I entirely understand. If you are running in a cold code then performance does not matter and you can tolerate quickly allocating a small amount of boxes which will be as quickly reclaimed by scavenger once you are done with them. If you are running in an hot code --- then it should be optimized in a way that minimizes the number of boxes produced.

Let's say that performance matters in the application, but it is huge in code size and all the code matters, not a few small parts. Would you call all the code hot, and would v8 optimize the entire application? (i.e., how is 'hot' defined in v8?)

Hot currently is defined as "function called more than X times" and "function contains a loop that took a backedge more than Y times". So it is defined per-function basis.

I can hardly speculate how V8 will behave on some abstract application. That is really highly dependent on how code looks like. But ultimately V8 will try to optimize everything that falls under criteria outlined above.