Hacker News new | ask | show | jobs
by DannyBee 4863 days ago
Yes, you could do a background thread, with some caveats:

1. On most current CPU's, this will cause really bad cache/memory thrashing, enough to probably impact the program.

2. This may actually cause significant slowdown, depending on how long it takes to optimize a given set of code (IE it may be better to spend 100ms paused optimizing than 5000ms in the background). This is, of course, a latency issue.

3. State of the art for most JIT's is still to use one thread. The number of folks doing actual parallel code generation is nil. So sadly, even if you had 4 cores, 3 empty, you'll still, at best, get to use one of them for the background thread doing the optimizing. There are parts that are trivial to parallelize if you've structured the JIT "right", but they aren't always the parts that are slow.

2 comments

Background compilation in a separate thread actually works pretty well. IE9 has been shipping it with Chakra for a while, and Firefox is now getting it (and it improved the benchmarks a lot, especially on ARM).
Good to hear it's gotten better. Admittedly, I wasn't thinking about browser based JITs when I said that :)

I'm actually curious if you have any stats on how much of the time this is being done on actual busy machines where it's going to compete for L1/etc resources vs how often it's able to be offloaded onto an otherwise empty core.

IE i expect their to be a significant difference in the use cases for JIT's like PyPy, which are probably going to sit on shared servers that folks are trying to maximize utilization of, vs desktops where I imagine most browsing probably doesn't use all cores at 100%.

> Admittedly, I wasn't thinking about browser based JITs when I said that :)

Don't HotSpot and JRockit also do background (de)compilation & swapping of generated code?

Yes, but in hotspot's case I cannot remember if it is actually turned on in both "server" and "client"
Aren't server and client not now merged with tiered compilation in Hotspot?
No, AFAIK. "Tiered compilation, introduced in Java SE 7, brings client startup speeds to the server VM. ... Tiered compilation is now the default mode for the server VM. "

Again, AFAIK, the server VM still has a significantly different set of tuning than the client VM. In particular, it runs some significantly more complex opts that the client VM does not.

ad 1) Hm, this seems to be a good point, but what's with the following line of thinking: some thread A interprets a program P, while another thread B compiles P to native machine code (P'). Now, if another thread C would start executing P' (taking the data/snapshot from A), then C's caches should build up and remain accurate. Of course, if this happens too often, then the caching behavior will be shitty. I always wondered (based on my interest in interpretation), how much I-cache misses the instruction cache flushes after inline-caching in native machine code cause. (If you have some data on that, please let me know.)

ad 3) I am well aware of that. However, I remember that at PLDI'11 there was a talk from Univ. of Edinburgh chaps doing parallel trace-based dynamic binary translation. Obviously, DBT is less work than a high level, full-blown JIT, but at least it's not nil :)