Hacker News new | ask | show | jobs
by loarake 4673 days ago
My reply was in response to the statement "numpy is two orders of magnitude faster here; it's evidently using a highly optimized internal codepath for random sequence generation", which is false, it's not because of highly optimized internal codepaths for random sequence generation, it's because the code produced a numpy array (or didn't have to do type conversion). But I agree, when using numpy to produce a timing comparison, it would be fair to start with a numpy array, or to show the time involved in the creation of the array.
1 comments

Thanks for your comments. I hope it didn't sound like I was negatively comparing numpy's array/sequence operations to anything. I know very little about numpy, and I assume that "real" numpy solutions don't look anything like what's being discussed here. I only included those measurements since the article's author did.

To clarify my points a bit, the optimizations I alluded to (in "highly optimized internal codepath") were meant to include things like using a generator, i.e. at no point is there an actual array of input random numbers. The fact that in numpy the 300-element "array" and the 3,000,000-element "array" had identical timings suggests exactly that; I disagree that it's an issue of internal representation, unless the concept of a numpy array subsumes the concept of a generator, in which case I think we're all saying the same thing.

That kind of optimization is only possible in this case because by the definition of randomness nobody could know what the values were until they were enumerated, so it's 100% transparent to use a generator. That's not how real-world data works, hence my forced-native-array measurement and pudquick's reply.