Hacker News new | ask | show | jobs
CPU core estimation with JavaScript (blog.wg.oftn.org)
44 points by rakeshpai 4771 days ago
20 comments

This is really difficult to do. For a number of reasons, speed does not necessarily scale proportionately with the number of cores:

(1) Intel Turbo Boost, which will run a single core at higher frequency than multiple cores (to make matters worse, this is also dependent on the CPU's temperature).

(2) Hyperthreading giving you anywhere between a x2 and no speedup depending on what your code is doing under the hood.

(3) NUMA giving, well, non-uniform speedups for some memory access patterns.

(4) Background processes occupying 1+ cores.

On 3 different Lenovo T420's (dual-core i5 with hyperthreading):

  * Linux Mint 14 (x64): 6 cores
  * Arch Linux (x64) machine #1: 6 cores
  * Arch Linux (x64) machine #2: 14 cores
And on a HP desktop, also dual core i5:

  * Windows 7 (x64) on Firefox: 3 cores
  * Windows 7 (x64) on IE: no result at all, demo hangs
Back to the drawing board I'm afraid...
I have just committed a bunch of improvements that should greatly increase the accuracy. Could you please try running the demo again?
Thanks for the results it really helps us. We have a lot more planned to improve Core Estimator's accuracy coming up.
A few more data points: MacBook Air, Core i5, dual core with hyperthreading.

Chrome 27: 1 or 2 cores. Chrome 26 gave me 3 a few times.

Firefox 21: 8-16 cores. One times out of four, the test goes up to testing 32 cores, but the CPU activity drops abruptly, and the test hangs (the button text remains a greyed out "Running").

Safari 6.04: 1 core. Rarely two.

The thing that surprised me most is that on my colleague's system (the second Arch x64 one), the estimator went up to 16 cores before dropping back to 14, while my own system (Mint 14) and the other Arch system both went to 8, then back to 6.
on my i7 (4 physical, 8 virtual cores, win xp 64bit, opera), it's now at 256 cores and testing for 512.. no end in sight ^^
ernesth and I have the same problem on Opera. I'm looking into why that is.
Might it be because Opera doesn't use multiple threads for web workers?
I get 2-3 cores for a quad-core i7, Chrome on Windows 7. It should probably be 8.
On the other side of the scale, my 2720QM (4 cores + hyperthreading) came back as 24 cores. Man, if only...
T420, I7 with 2 Cores, HT and TB. Firefox on Ubuntu reports 6 cores.
Multiple tries on T420 -> 32 cores. I wish!
Dual core Intel Core i5 MacBookPro8,1 running Chrome 26.0.1410.43 (OS X 10.8.2) - lots of tabs open. The script varies between 1 and 4 cores. Usually 1 core.. More chance of calculating 3 or 4 cores if I scroll up & down rapidly.
Well, it said 9 cores on my laptop with 2 (plus hyperthreading, so 4).

But why even try to estimate the number of cores? Just call it a benchmark that tries to find the number of threads you should run.

On some machines it is likely that the result you get the second time will be different from the first.

And you're right, we designed the script to find the optimum number of web workers to use in parallel, that's what it's primary purpose is. But we also want to drive adoption of navigator.cores so we decided to call it a core estimator.

There are usually a lot of random stuff happening on the average user's machine. Could an anti-virus scan start in the middle of this test and fudge up the result?
Yeah. There are a lot of outside influences that can affect the results of the estimation. Unfortunately this is the only way to go until browsers vendors adopt navigator.cores. Future versions will be able to cope better with this by taking advantage of localStorage on the user's machine and averaging it out.
In the long run, an API like navigator.cores is not what we actually want. How do you count cores in a world of SMT, shared caches, shared memory, and unrelated processes? Ideally, we want a higher-level API which allows the underlying system to dynamically adjust the level of parallelism.
So this 10 second test is supposed to run... When a page loads? It's a good effort, but we really need a native solution to this issue. Or maybe just approach it by dictating 2 cores minimum (most modern day computers are operating at least two cores).

Also, I thought browsers didn't offer more thread access...

In practice, when people use our library, we recommend that the test is called only when it's needed, instead of directly when the page loads. There are also other guidelines as well such as always allowing the user the manually edit the number of cores used in an app.
Demo does not seem to work in Opera(1): I have stopped it when it tried 512 workers (after 15 minutes). And since only one of my 2 cores was ever working, I doubt the result would have been accurate.

(1): Opera 12.15, Debian GNU/Linux, Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz.

Weird. Core Estimator performs very strangely on Opera, we're trying to find out why.
Did not work very well for me, my 3930k at 4.5ghz was reported as having 3 cores. Monitoring the CPU usage it only seemed to effect the usage on a couple of the cores, most of them remained idle while the test was running.

Edit: worked quite well on my nexus 4. Got 4 cores on the first test.

My iPhone 4S came back with 2 cores consistently. It took ~25 seconds each time.

If the goal is to find the sweat spot for an algorithm, wouldn't it be best to ran several small tests of the specific algorithm with different optimization values?

This thing crashed my PC, that's why I'm waiting for Haswell so I can buy a new one.
I'd be interested to know who is getting very close to accurate results on their machines. (Being off by 1 or 2 is not a big deal since CPU availability is more important to a developer than the number of CPU cores)
Accuracy aside, which I'm sure can be improved to a certain degree, I don't think this is a desirable property for a browser to expose to javascript. Just because a node has a certain number of cores doesn't mean they're available to javascript, or that they even represent a realistic picture of what the machine has.

As other people have pointed out, things like Intel speed boost change the performance characteristics of the machine, and things like virtualization can flat-out lie about the capacity of the machine.

It seems like a far more preferable solution is to lightly parallelize based on generic defaults and let the OS handle switching, instead of trying to outsmart the system.

Windows 7 (x64), AMD A6-3650, (4 core Llano).

IE10: consistently 7 cores, 11.0 seconds; FF21: consistently 4 cores, unconsistently 15—17 seconds; Cr27: consistently 4 cores, 16.0 seconds.

I don't see what's the point of including hardware details into the W3C/HTML5/Javscript Specification.

And secondly, the demo said my old T7300 dual core would have 3 core.

If you read the blog you would know why this information is useful :)

Web workers make it possible to do parallel computation with JavaScript, and knowing the right number to spawn helps to make sure resources are being taken advantage of. We already have the ability to run JavaScript code on multiple processors, but currently, you can't tell where it's being run. For performance reasons, knowing the number of workers to spawn can make a big difference.

Secondly, it's called an estimator so it's never going to be perfect. It's a tool designed to give you some idea of what is going on.

I don't think the number of cores is really useful for determining the number of web workers to run in parallel.

If you want to push for something, push for a mechanism that frees you from having to set the number of workers to spawn.

I did find this interesting piece of information.

> In a compute-bound application running on an N-processor machine, adding additional threads may improve throughput as the number of threads approaches N, but adding additional threads beyond N will do no good. Indeed, too many threads will even degrade performance because of the additional context switching overhead. The optimum size of a thread pool depends on the number of processors available and the nature of the tasks on the work queue. On an N-processor system for a work queue that will hold entirely compute-bound tasks, you will generally achieve maximum CPU utilization with a thread pool of N or N+1 threads.

From http://www.ibm.com/developerworks/library/j-jtp0730/index.ht...

That's exactly what a threading library abstraction aims to do. For example the Intel TBB (Threading Building Blocks).
On an Intel i5-2520M (4 threads [1]) it reported 5 usable worker threads.

As I see from others, the result reported by core-estimator is spot on N+1, where N is the number of virtual cores [2].

It might be a coincidence, but this is exactly the number of jobs one would give to make when compiling [3].

Knowing N helps to be efficient when spawning threads (avoiding swamping or starving cores).

[1] http://ark.intel.com/products/52229/ [2] as reported for example by /proc/cpuinfo on Linux [3] i.e. "make -j5"

This is one of many variegated reasons why I'm no fan of JavaScript (and the modern HTML/JS/CSS admixture).

Hoping (P)NaCl will pick up steam soon...

Showed my OC'ed Quad Core as a Five Core processor.

Made me smile a bit.

Then I ran it a second time, got up to 32 cores, and the app crashed my browser.

5 cores on my samsung galaxy s3 (chrome browser), 2 on my mac book air (chrome browser)
12 core Mac Pro - estimated me at 61 cores. I wish!
"Estimated 7 cores - Took 17.242 seconds."

I have two cores.

"Estimated 14 cores - Took 22.704 seconds."

i5-2500k 4cores - 4 threads.

"Estimated 21 cores" I have four.
useless apparently my 16 core mac has only 14 cores.