| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by no-bugs 3513 days ago

> made millions of dollars of revenue... I'm not 100% sure you are well situated to lecture me about the 'real world' because you wrote an academic paper.

So, we're going to discuss millions of dollars instead, sigh... This way Trump should be one of the best programmers in the universe, I guess.

> Typically it will be inherently cheaper than 15-30 cycles - but lost opportunities for optimization may take you to numbers that are insanely higher than that.

And still, even from your own rant it follows that in vast majority of cases the estimate of 15-30 cycles will be well within "order of magnitude" (TBH, I didn't see millions myself, but it should be a really strange corner case).

> If you say 15-30 is the magic number you are creating folklore... We are burdened by folkloric stuff... that result in considerably worse designs.

And not having any such "folklore" results in even worse designs :-( (actually - MUCH worse ones). Using list instead of vector can easily give you 100x penalty for absolutely zero reason (actually, up to 780x was experimentally observed - and that's without swapping). Off-loading 100-cycle chunks (with a thread context switch back after calculating these 100 cycles) to a different thread will never work at least on a x64 (though I remember meeting some folks from Intel - I think they were representing an OpenMP team - who were seriously preaching otherwise, based on utterly silly "how many cores we managed to utilise" metric without realising that the whole thing became _slower_ after they parallelised it ;-( ). And so on and so forth.

Sure, the numbers are very rough. But trying to say that "hey, it is not precise so let's not even try to estimate" - is even worse than that.

1 comments

glangdale 3513 days ago

I don't think Trump made his money doing low-level performance programming for the past 10 years, so I'm not sure your analogy is valid.

However, since you, whoever you are, have not only written a hash table, but discovered profundities like 'Sometimes list costs 780x as much as vector for "absolutely zero reason"', and 'don't try to offload 100 cycles of work to another thread' I'm going to defer to your expertise. I recommend you stick a bone through your beard, pronounce yourself a performance guru, and make bank. Have fun.

no-bugs 3512 days ago

> I recommend you stick a bone through your beard, pronounce yourself a performance guru, and make bank. Have fun.

:-) :-) I LOVE when my opponent has to resort to personal insults :-). Leaving aside any sarcastic remarks in this regard:

For Intel's sake I Really Hope that these "profundities" are indeed very well-known to you - and believe it or not, they're very well-known to me too for at least 10 years. However, this is not the point; the point is that there are LOTS of developers out there who do NOT know them - and the OP is intended for them (and not for "performance gurus").

It is actually THIS simple. Eliminating 90% of inefficiency does not really require black magic or "performance gurus" who know exactly how the pipeline of specific CPU works. And this is exactly what I'm arguing for - to educate app-level developers and architects about this low-hanging fruit of 10x+ inefficiencies; I can assure you that it is very far from being universal knowledge in app-level development circles.