| You realize this person that you are calling an expert claimed that they optimize software by putting tiny memory allocations into their tight loops right? I'm not really sure how anything I'm saying is even controversial unless someone is desperate for it to be. Anyone who has profiled and optimized software has experience weeding out excessive memory allocations since it is almost always the lowest hanging fruit. No matter how fast allocating a few bytes is, doing what you are ultimately trying to do with those bytes is much faster. Allocating 8 bytes in 150 cycles might seem fast, until you realize that modern CPUs can deal with multiple integers or floats on every single cycle. A 12 year old CPU can allocate space for, and add together well over 850 million floats to 850 million other floats in a single second on a single core. You can download ISPC and verify this for yourself. By your own numbers, the allocation alone would take about a minute and a half. Neither of you has confronted this. I'm actually fascinated by the lengths you both have gone to specifically to avoid confronting this. Saying that lots of small allocations has no impact on speed is counter to basically all optimization advice and here I have explained why that is. |
You're either misunderstanding, or pretending to misunderstand for some reason.
What I said was that allocating fresh objects and using those can be faster than re-using stale objects in some failed attempt to optimise by reducing allocations.
Why would that be? For the reasons I explained: The newly allocated objects are guaranteed to already be in cache. Each new object is guaranteed to be close to the last object you used, because they're allocated next to each other. The new objects are not going to need any memory barriers, because they're guaranteed to not be published. The new objects are less likely to escape, so they're eligible for scalar replacement.
You dismissed all that as 'throwing out terminology'.
Here's a practical example:
Which would you think is faster? The one that allocates a new object each iteration of the inner loop? Or the one that re-uses an existing object each time and doesn't allocate anything?It's actually the one that allocates a new object each time. The cached one is 1.6x slower in an optimising implementation of Ruby.
It's faster... but the only change I made was I added an object allocation instead of the custom object caching. I went from not allocating any objects to allocating an object and it became faster. This example is so clear because of the last factor I mentioned - scalar replacement.
If you came along and 'optimised' my code based on a cargo cult idea of 'object allocation disastrously slow' you wouldn't be helping would you?