|
|
|
|
|
by luispedrocoelho
3059 days ago
|
|
That may be the case. However, my point is that we started with a rather direct implementation of a formula in a paper. This was very easy to write but took hours on a test set (which we could extrapolate to taking weeks on real data!). Then, I spent a few hours and ended up with that ugly code that now takes a few seconds (and is dominated by the whole analysis taking several minutes, so it would not be worth it even if you could potentially make this function take zero time). Maybe with a few more hours, I could get both readability and speed, but that is not worth it (at this moment, at least). * The comment about the benchmark data being large is exactly my point: as datasets are growing faster than CPU speed, low-level performance matters more than it did a few years ago (at least if you are working, as I am, with these large data). |
|
1. Have gotten similar performance boosts elsewhere, meaning that you wouldn't have needed to refactor this function in the first place (although the implication of a 10000x speedup means that may not be true, although I can absolutely see the potential for 100x speedups in this code, depending on exactly what the input data is)
2. Its likely that there are much more natural ways to implement the function you have in pandas more idiomatically. These would be both clearer and likely equally fast, though possibly faster. (heck, there are even ways to refactor the code you have to make it look a lot like the direct from the paper impl)
In other words, this isn't (necessarily) a case of python having weak performance, its a case of unidiomatic python having weak performance. This is true in any language though. You can write unidiomatic code in any language, and more often than not it will be slower than a similar idiomatic method (repeatedly apply `foldl` in haskell). I'm not enough of an expert in pandas multi-level indexes to say that for certain, but I'd bet there are more efficient ways to do what you're doing from within pandas that look a lot less ugly and run similarly fast.
Granted, there's an argument to be made that the idiomatic way should be more obvious. But "uncommon pandas indexing tools should be more discoverable" is not the same as "python is unworkably slow".