| HN Mirror

Thanks for your comments. I hope it didn't sound like I was negatively comparing numpy's array/sequence operations to anything. I know very little about numpy, and I assume that "real" numpy solutions don't look anything like what's being discussed here. I only included those measurements since the article's author did.

To clarify my points a bit, the optimizations I alluded to (in "highly optimized internal codepath") were meant to include things like using a generator, i.e. at no point is there an actual array of input random numbers. The fact that in numpy the 300-element "array" and the 3,000,000-element "array" had identical timings suggests exactly that; I disagree that it's an issue of internal representation, unless the concept of a numpy array subsumes the concept of a generator, in which case I think we're all saying the same thing.

That kind of optimization is only possible in this case because by the definition of randomness nobody could know what the values were until they were enumerated, so it's 100% transparent to use a generator. That's not how real-world data works, hence my forced-native-array measurement and pudquick's reply.