| Your statements in the blog about performance of that pattern are not accurate to the best of my knowledge. A program structured as a sequence of short, tight loops over vectors/slices as described is more or less hitting the performance sweet spot of modern microarchitectures. The general form of a short loop over a vector plays to the strengths of branch prediction and the instruction cache while the temporal and spatial locality of the memory accesses are conducive to caching and pipelining. Moreover, there's usually no need to allocate or copy the array in that sort of data flow. I mean, unless you're chasing worst case performance to make a point or something. Slice the buffers out of a pool and allow ownership of the data to follow the execution context, then you're free to modify it in place. The above points cover 3-4 orders of magnitude real world performance. It is true that there are some issues with the optimizer working in this pattern but I have only seen performance severely degraded in this pattern by compile time type ambiguity -- some but not all interface{} parameters, anonymous functions / closures in scopes with dynamic type. Interface types with a pointer receiver do not typically encounter these specific issues, and I believe the empty interface type has much better performance since ~1.12 or so as well. |
Sure, but even faster is not to loop over intermediate arrays at all, by virtue of never constructing them in the first place when they aren't necessary.
"Moreover, there's usually no need to allocate or copy the array in that sort of data flow. I mean, unless you're chasing worst case performance to make a point or something. Slice the buffers out of a pool and allow ownership of the data to follow the execution context, then you're free to modify it in place."
That starts getting into "I'm sure someone can come up with some solution that meets some of these goals". Whatever map you're talking about here isn't one that is defined as creating a new slice based on mapping a function over the old slice. I mean, it kinda sounds like you're saying "well, if you just write a conventional for loop you can do all this in one pass" to me? Which is my point? I'm not the one pitching for lots of array creation, it's people who insist on using maps and filters in a language that, of all the major languages, just isn't going to put in the optimization time to convert them back into loops under the hood.