|
|
|
|
|
by taeric
3196 days ago
|
|
This was what I assumed with the message. Doesn't change my point. If this is something that is truly a bottleneck, then it should be baked into numpy. In particular, agreeing for the array to calculate min/max/p50/p75/... in one pass is something that would make sense. Regarding the moving the calculation, yeah. I get that. I argue that the compiling step of the module happens once for the module, no? The JIT will be something you force onto every execution. Right? And none of this actually means this library shouldn't have been made. Just that it is a poor example for why it is better. |
|
I'd add some "intelligence" (if you will) to the class, so that when I calculate the max(), I'll also track the min, average, etc, save those values in an internal cache, and return the max. Next function call (wether it's max, min, avg, or any other of the cached values), it gets the result from the precomputed cache.
Of course, some operations will affect said results, and there are two ways to handle that: either modify the cached results, if we're talking about some change that can be formulated (say, multiplying by 2 will multiply every computed value by 2), or simply invalidate the cache and re-compute the next time one of this functions is called.
As it was said, this would mean calculating some things that aren't the explicitly asked thing (max, min, avg, etc), and giving a worse case scenario, in the hope that it will result in a speedup on some usecases.
My guess as to why they don't do it? Because it's not a "generalizable" problem, and you have the machinery at hand to implement your custom function that fits your particular use case (say, using Cython and the numpy/cython wrappers).