Hacker News new | ask | show | jobs
by chrisseaton 1921 days ago
> You realize this person that you are calling an expert claimed that they optimize software by putting tiny memory allocations into their tight loops right?

You're either misunderstanding, or pretending to misunderstand for some reason.

What I said was that allocating fresh objects and using those can be faster than re-using stale objects in some failed attempt to optimise by reducing allocations.

Why would that be? For the reasons I explained: The newly allocated objects are guaranteed to already be in cache. Each new object is guaranteed to be close to the last object you used, because they're allocated next to each other. The new objects are not going to need any memory barriers, because they're guaranteed to not be published. The new objects are less likely to escape, so they're eligible for scalar replacement.

You dismissed all that as 'throwing out terminology'.

Here's a practical example:

  require 'benchmark/ips'

  def clamp_fresh(min, max, value)
    fresh_array = Array.new
    fresh_array[0] = min
    fresh_array[1] = max
    fresh_array[2] = value
    fresh_array.sort!
    fresh_array[1]
  end

  def clamp_cached(cached_array, min, max, value)
    cached_array[0] = min
    cached_array[1] = max
    cached_array[2] = value
    cached_array.sort!
    cached_array[1]
  end

  cached_array = Array.new

  Benchmark.ips do |x|
    x.report("use-fresh-objects") { clamp_fresh(10, 90, rand(0..100)) }
    x.report("use-cached-objects") { clamp_cached(cached_array, 10, 90, rand(0..100)) }
    x.compare!
  end
Which would you think is faster? The one that allocates a new object each iteration of the inner loop? Or the one that re-uses an existing object each time and doesn't allocate anything?

It's actually the one that allocates a new object each time. The cached one is 1.6x slower in an optimising implementation of Ruby.

It's faster... but the only change I made was I added an object allocation instead of the custom object caching. I went from not allocating any objects to allocating an object and it became faster. This example is so clear because of the last factor I mentioned - scalar replacement.

If you came along and 'optimised' my code based on a cargo cult idea of 'object allocation disastrously slow' you wouldn't be helping would you?