Hacker News new | ask | show | jobs
by codyguy 2950 days ago
Dear Walter,

Thank you for the great work you are doing.

If you could get people to include D in the benchmarks / shootouts that get published it would help improve the popularity of D.

FYI - Thatneedle.com already uses it for part of the backend data processing workflow.

1 comments

Isaac Gouy is the gatekeeper for the shootout, and he refuses to include D for reasons he's refused to elucidate.

I stopped publishing benchmarks myself 15 years ago as people always assumed my thumb was on the scale.

>> for reasons he's refused to elucidate

For the simplest reasons that I have stated to you many times!

Including every language implementation that the language author would like to have included is more work than I am willing to do, period.

Sept 13 2008: 63 language implementations were shown-

https://web.archive.org/web/20080913030117/http://shootout.a...

- currently, 27 language implementations are shown.

You could truthfully say "he refuses to include [at-least 30 language implementations]". In that regard, there's nothing special about D.

I have no context regarding the history of the benchmarks game. With that in mind, I have a question:

What about you create a technical specification (including both technical depth and common-sense breadth) for what kind of benchmark you'll accept, throw the whole thing on GitHub, and then refuse 99% of pull requests? :) (ie, only accept really really good quality benchmark implementations)

Eventually, enough developers unimpressed that Language X is not adequately represented would step up to the plate and maintain good-quality benchmark code. (This could get pretty interesting with rapidly-evolving languages like Rust.)

Obviously this is all very ideal and I can see so many ways such an endeavor could go horribly wrong, sure. I can also very easily see you having floated such an idea then discarded it for reasons I haven't even thought of.

What about I work to publish crowd-sourced programs and measurement scripts: so that anyone can make their own comparisons, on their own hardware, against whatever other language implementations they want to write programs for :-)

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Ah; that certainly works too!

The only thing I wonder about is allowing any given set of hardware to be compared with another set of hardware with anything approaching accuracy.

What would be awesome is if someone could figure out whether they'd need Raspberry Pi Zero to run some algorithm or if they could get away with just using a BBC Micro:Bit, by looking at a benchmark run submitted to the site from an i9-7940X.

Sadly I suspect the amount of realtime introspection needed (memory speed, practical cache coherency (per level), bus saturation, instructions used per cycle) would make this difficult - particularly because, even if a given benchmark was going to go fishing for performance counter info (and I just learned that even the Gen1's ARM 1176-based PMU provides a few, including one that counts instructions), benchmarking memory I/O is a bit harder; PCI DMA MMIO debug cards only map a small range of memory, like 256k or so, and I don't know if such cards can back the region with actual RAM. I suspect the access latency would be so different than from normal RAM that even if this did exist it'd never be used. Sigh.

And then differences in compiler optimization approach would have to be taken into account, and thorough understanding of assembly language for all target architecture(s) would be needed to have the time of day to page through the diff analysis...

Hmm, I think "simple" got left behind a few thousand miles ago. It's kinda (morbidly?) fascinating how different that different architectures are on the ground, hah.

Oops:

- To clarify, I meant the Gen1 Raspi.

- I now realize (after having actually had a proper look) that while the Micro:Bit does use "JavaScript" and "Python", they're different, microcontroller-specific implementations, and I should have used a different example there, like an Intel NUC or perhaps an ODROID board or similar.

>> The only thing I wonder about is allowing any given set of hardware to be compared with another set of hardware with anything approaching accuracy.

Just systematically enough to be reminded that a different context may have different relative performance.

You've never explained what your criteria is. Just that you didn't want to include D.
I always did want to include D — which is why we did include D — of course, we also used to include Scala and Clojure and…
I know you're publicly directly messaging, but as an outsider to the discussion, I don't see any actual mention in your posts why D isn't included. Just that it "used to be."

There are many of benchmark projects that include D and don't appear to be struggling under any kind of massive burden. For example:

https://github.com/kostya/benchmarks

What are your actual reasons for not keeping D... "scala... and Clojure and ..."? The results in the previous link show D as a massive competitor (sometimes 1st place beating C and C++) on both memory and speed. Wouldn't the purpose of benchmarks be... to highlight useful, highly-scoring languages? Isn't that one of the primary reasons people read benchmarks?

(The D implementations are also often smaller in lines of code, as per this benchmark project:

https://togototo.wordpress.com/2013/08/23/benchmarks-round-t...

)

>> I don't see any actual mention in your posts why D isn't included.

Do you see any mention at kostya/benchmarks of why ".NET Core" or Ada isn't included?

>> … benchmark projects that include D and don't appear to be struggling under any kind of massive burden.

Check when those kostya/benchmarks language implementation versions were released.

>> What are your actual reasons for not keeping…

I don't want to do the work.

I was a bit surprised by the inclusion of lesser languages and the omission of D !!

Please share the secret of your tenacity and inspiration in the face of such "setbacks". How do you do it on a day to day basis?

I know how good D is and I have a lot of confidence in it. There are a lot of great people using it that are getting their jobs done faster & better and are thereby making more money. That's what matters to me.

Everybody likes to make more money :-)

Back when I was a brand new engineer at Boeing, I was talking to my lead engineer about the beauty of aerospace engineering. He smiled and said I didn't get it. Boeing wasn't making airplanes, they were building money making machines for airline companies. That the airplane turned out to be a beauty was just a happy side effect!

D isn't about my personal aggrandizement. It's about how effective it is for users at solving their problems and making money for them.

Has there been any recent discussion on this? I suspect earlier reasons might have included D not being packaged for Debian?

It can be though to shake "early license impressions" - D (dmd specifically) is one example, Ada (FSF GNAT vs AdaCore's GNAT Pro and Ada-gnat - GPL w/o runtime exception) is another.

[ed: if others are interested in "make your own measurements and host it yourself", relevant page with link to code etc appear to be:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

]

> any recent discussion

Not since I asked him to stop dropping by the D forum to remind us that he wouldn't include D in the shootout.

I've personally lost all interest in it.

(The benchmarks are also small and too easily gamed by the author of the benchmark code, and the compiler developer. I encourage people to run timing tests on their own code, as that is what matters to them.)

My experience has been that I get better ideas of benchmarks by profiling my real application code, porting the hotspots to the new language or framework within a mostly bare skeleton and testing it. It’s not perfect, but gives me a better idea as to whether moving will actually help!
> I encourage people to run timing tests on their own code, as that is what matters to them.

This is the heart of the matter: benchmarking toy and unrelated bits of code amount to the same thing.

I think what people really need is a straightforward, accessible understanding of where a software implementation is inefficient. Design inefficiency can sometimes be "hacked over" (read "paved over") a bit like evening out technical debt; implementation inefficiencies, perhaps not so much. Making a complex database query faster by fiddling with (very) obscure SQL options because a given vendor's query planner is broken in a rare edge-case is one example that comes to mind.

I think it's especially sad when a software implementation becomes renowned for its inefficiency. It kinda takes the heat off in what I would argue is a very unfair way, like it legitimizes slowness like speed isn't a worthy pursuit... and then we wonder why our computers are slow. (Yeah, I'm thinking about Python here... and to a small extent many interpreted languages.)

In the context of D (and now DMC++ :D), so specifically compilers, it would be interesting to know what areas of the language don't generate especially fast code, or what bits might produce code that uses a little memory than it could, etc. Because that's what people really want to know before they take the time to write/port; and if they know about all the instances of "don't do XYZ in this very specific way" ahead of time, maybe they can write the best possible code on the first try! (Design and implementation are intertwined in practice.)

I suspect the list of such "avoid"s may not be long. It might make for a particularly efficient kind of developer user manual.

I don't really see a big problem with a little bit of gaming on the benchmark-code side; with a few implementations to compare, it might even give some hints for idiomatic efficient code.

Gaming from the compiler/runtime side would be uglier - but I guess it is somewhat mitigated by real languages running the "general release" version.

No harm in "fastest way to list first 1000 prime numbers" being "print static list of first 1000 prime numbers".

Hand there's some value in having a standard benchmark harness that works easily across languages - as a helpful tool with "running your own benchmarks". Assuming the harness is any good, that is.

> I don't really see a big problem with a little bit of gaming on the benchmark-code side

I just got tired of the vitriol leveled at me with no basis. Things like I must have "sabotaged" other compilers. Probably the absolute worst one was when the journalist decided that Datalight Optimum C ran the benchmarks so fast, it must be a compiler bug and removed the benchmark results from his compiler roundup, calling DC a buggy compiler.

The reality was Datalight C was the first C compiler on DOS to do data flow analysis, and it deleted dead code. (Benchmarks of that era did nothing useful, and dfa detected that.) No cheating at all. A couple years later, everyone did dfa.

I did run my own prime number crunching benchmark for fun and D blew Rust away but lagged behind Go. I used mutable Vec and HashSet in Rust, associative arrays and arrays in D, maps and arrays in Go to store the sieves.
You see, that's the point - even if you do something as simple, there are many ways to do it (and different compiler versions, especially in the case of D), different optimizations at the code and compiler level and all that - it's practically impossible to have a reliable, comprehensive benchmark.

(That said, D is lightning fast for me!)

HashSet uses a cryptographically secure hashing algorithm by default, it’s gonna be slow.
What's the Rust equivalent of Go maps or D associative arrays then?
Not sure why you're getting down voted was about to post something along the same lines (though admittedly linking to a couple of threads, not just the search). But I don't think everyone on hn would be aware of the d forums (nor that it's a rather magnificent piece of "groupware" with usenet and irc support on top of (or should I say beneath) a quick Web interface.

https://github.com/CyberShadow/DFeed

From my perspective, the relevant "couple of threads" would be the old discussions which showed various people in the D community actually had the measurement scripts working with D programs: and yet that work becomes abandoned, so there's nothing like —

https://github.com/kostya/crystal-benchmarks-game