| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by igouy 618 days ago

Perhaps clarifications rather than questions.

> "Critique: 2.2.1 Programming Language versus Implementation"

In context, it seems possible to read "Programming Language" as shorthand for "Programming Language Implementation" as-appropriate.

(Especially since “Pereira et al.” list "Compiler / Interpreter Versions" such-as "JRuby : jruby 9.1.7.0" and "Ruby : ruby 2.4.1".)

> "Critique: 2.2.2 Quality of Benchmark Implementations"

Surely not the quality of the particular programs, selected from the benchmarks game and used for comparison by “Pereira et al.”; surely the suitability of the selection process used by “Pereira et al.” to choose programs for their purpose.

That "corpus of small benchmark implementations" most likely provided both parallel and sequential programs, most likely provided both SIMD and non-SIMD programs, etc.

Presumably “Pereira et al.” could have chosen only sequential / non-SIMD / standard library programs for their comparison, but did not.

> "Critique: 2.2.3 Apparent Anomalies."

> "C++ is reported as being 34% less energy efficient and 56% slower than C"

For a single outlier (regex-redux) there's a 12x difference between the measured times of the selected (pcre) C and (boost/regex) C++ programs.

As you say, apparent anomalies presented without investigation or explanation.

> "TypeScript is reported as being 4.8× less energy efficient and 7.1× slower than JavaScript."

It seems that there may have been some kind-of problem with tsc back in the day.

The exact same fannkuch-redux program that took 1,234.81 seconds (node.js v8.1.3 and tsc 2.4.1) in July 2017, only took 147.23 seconds (node.js v9.4.0 and tsc 2.6.2) in January 2018.

(Unfortunately the Internet Archive is currently unable to provide details.)

1 comments

nicovank 618 days ago

> Quality of Benchmark Implementations

Correct. "Selection of Benchmark Implementations" is a better name here. We'll update this in the next iteration. The point in this subsection is indeed that the selection is not adequate for comparison. This is not the only issue, even an adequate selection of perfectly idiomatic and identical implementations would not have resulted in accurate comparison.

> C/C++ Outlier

Correct, Section 4.5.2 details this. It is 8.9x for us.

> JS/TS Outlier

The main outlier on our machine is mandelbrot, 21x (Section 4.5.1). Our second outlier is n-body (not discussed).

link

igouy 618 days ago

> would not have resulted in accurate comparison

Because? Is the reasoning for that spelled-out somewhere in the paper?

> Section 4.5.2

> Section 4.5.1

After the paper had discussed “Pereira et al.” I repeatedly confused discussion of your new measurements with discussion of the old “Pereira et al.” measurements.

> "forcing benchmarks to run on a single core" p2&3

> "we eliminate the effect of varying concurrency in different benchmark implementations by limiting benchmarks to execute on a single core" p6

> "the JavaScript version uses 28 cores on average" p14

fwiw I am now very confused.

link

emeryberger 615 days ago

We only pin to one core for one experiment described in Section 4.3. All the remaining experiments are run with full access to all cores.

link

igouy 614 days ago

Thank you.

I'm concerned that section 2.2.1 is a misreading of Pereira et al.

[29] "… the performance of a language is influenced by the quality of its compiler, virtual machine, garbage collector, available libraries, etc."

In that context it seems plain that "language" must be understood as a shortening of "language implementation."

> "For instance, Pereira et al. treat Ruby and JRuby as different languages, while they are in fact two separate implementations of the same Ruby language."

It seems to me that Pereira et al. treat Ruby and JRuby as different "language implementations" and compare each one independently against the other language implementations.

(In the "corpus of small benchmark implementations" it was simply convenient to keep separate programs for Ruby and JRuby.)

link

emeryberger 614 days ago

Those papers say "language" over and over again, in the titles, in the body of the text. That work confounds languages and their implementations, and make it sound like there is a one-to-one connection between the two (of course, there is not necessarily such a correspondence).

With respect to Ruby vs. JRuby: my student just checked and verified that some but not all of the benchmarks are implemented differently (k-nucleotide, mandelbrot, pidigits, spectral-norm).

link

igouy 613 days ago

> Those papers say "language" over and over again, in the titles, in the body of the text.

Yes they do! And over and over again in-context we sensibly read that to mean what you wish to term more precisely "language implementation".

> Fig. 4 Fig. 5 "We are in fact comparing implementations of programming languages, not the languages themselves."

They know. They just prefer shorter names.

Here's their short-name precise-name lookup table:

https://sites.google.com/view/energy-efficiency-languages/se...

link