This is really difficult to do. For a number of reasons, speed does not necessarily scale proportionately with the number of cores:
(1) Intel Turbo Boost, which will run a single core at higher frequency than multiple cores (to make matters worse, this is also dependent on the CPU's temperature).
(2) Hyperthreading giving you anywhere between a x2 and no speedup depending on what your code is doing under the hood.
(3) NUMA giving, well, non-uniform speedups for some memory access patterns.
A few more data points: MacBook Air, Core i5, dual core with hyperthreading.
Chrome 27: 1 or 2 cores. Chrome 26 gave me 3 a few times.
Firefox 21: 8-16 cores. One times out of four, the test goes up to testing 32 cores, but the CPU activity drops abruptly, and the test hangs (the button text remains a greyed out "Running").
The thing that surprised me most is that on my colleague's system (the second Arch x64 one), the estimator went up to 16 cores before dropping back to 14, while my own system (Mint 14) and the other Arch system both went to 8, then back to 6.
Dual core Intel Core i5 MacBookPro8,1 running Chrome 26.0.1410.43 (OS X 10.8.2) - lots of tabs open. The script varies between 1 and 4 cores. Usually 1 core.. More chance of calculating 3 or 4 cores if I scroll up & down rapidly.
On some machines it is likely that the result you get the second time will be different from the first.
And you're right, we designed the script to find the optimum number of web workers to use in parallel, that's what it's primary purpose is. But we also want to drive adoption of navigator.cores so we decided to call it a core estimator.
There are usually a lot of random stuff happening on the average user's machine. Could an anti-virus scan start in the middle of this test and fudge up the result?
Yeah. There are a lot of outside influences that can affect the results of the estimation. Unfortunately this is the only way to go until browsers vendors adopt navigator.cores. Future versions will be able to cope better with this by taking advantage of localStorage on the user's machine and averaging it out.
In the long run, an API like navigator.cores is not what we actually want. How do you count cores in a world of SMT, shared caches, shared memory, and unrelated processes? Ideally, we want a higher-level API which allows the underlying system to dynamically adjust the level of parallelism.
So this 10 second test is supposed to run... When a page loads? It's a good effort, but we really need a native solution to this issue. Or maybe just approach it by dictating 2 cores minimum (most modern day computers are operating at least two cores).
Also, I thought browsers didn't offer more thread access...
In practice, when people use our library, we recommend that the test is called only when it's needed, instead of directly when the page loads. There are also other guidelines as well such as always allowing the user the manually edit the number of cores used in an app.
Demo does not seem to work in Opera(1): I have stopped it when it tried 512 workers (after 15 minutes). And since only one of my 2 cores was ever working, I doubt the result would have been accurate.
(1): Opera 12.15, Debian GNU/Linux, Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz.
Did not work very well for me, my 3930k at 4.5ghz was reported as having 3 cores. Monitoring the CPU usage it only seemed to effect the usage on a couple of the cores, most of them remained idle while the test was running.
Edit: worked quite well on my nexus 4. Got 4 cores on the first test.
My iPhone 4S came back with 2 cores consistently. It took ~25 seconds each time.
If the goal is to find the sweat spot for an algorithm, wouldn't it be best to ran several small tests of the specific algorithm with different optimization values?
I'd be interested to know who is getting very close to accurate results on their machines. (Being off by 1 or 2 is not a big deal since CPU availability is more important to a developer than the number of CPU cores)
Accuracy aside, which I'm sure can be improved to a certain degree, I don't think this is a desirable property for a browser to expose to javascript. Just because a node has a certain number of cores doesn't mean they're available to javascript, or that they even represent a realistic picture of what the machine has.
As other people have pointed out, things like Intel speed boost change the performance characteristics of the machine, and things like virtualization can flat-out lie about the capacity of the machine.
It seems like a far more preferable solution is to lightly parallelize based on generic defaults and let the OS handle switching, instead of trying to outsmart the system.
If you read the blog you would know why this information is useful :)
Web workers make it possible to do parallel computation with JavaScript, and knowing the right number to spawn helps to make sure resources are being taken advantage of. We already have the ability to run JavaScript code on multiple processors, but currently, you can't tell where it's being run. For performance reasons, knowing the number of workers to spawn can make a big difference.
Secondly, it's called an estimator so it's never going to be perfect. It's a tool designed to give you some idea of what is going on.
> In a compute-bound application running on an N-processor machine, adding additional threads may improve throughput as the number of threads approaches N, but adding additional threads beyond N will do no good. Indeed, too many threads will even degrade performance because of the additional context switching overhead. The optimum size of a thread pool depends on the number of processors available and the nature of the tasks on the work queue. On an N-processor system for a work queue that will hold entirely compute-bound tasks, you will generally achieve maximum CPU utilization with a thread pool of N or N+1 threads.
(1) Intel Turbo Boost, which will run a single core at higher frequency than multiple cores (to make matters worse, this is also dependent on the CPU's temperature).
(2) Hyperthreading giving you anywhere between a x2 and no speedup depending on what your code is doing under the hood.
(3) NUMA giving, well, non-uniform speedups for some memory access patterns.
(4) Background processes occupying 1+ cores.