|
|
|
|
|
by japhyr
1969 days ago
|
|
I was surprised to find out how slow strptime() can be. I was working on a data-focused project that was finally starting to slow down from the growing volume of data. I was looking at river heights over time, and once I hit about 140,000 data points the project got slow enough to make some profiling and optimization worthwhile. I was quite surprised to find it was spending more than two full seconds just running strptime(), out of a total execution time of around 15 seconds. I ended up looking at a bunch of different ways of processing timestamps in Python: strptime(), string parsing, regex, datetime.isoformat(), NumPy, Pandas, and more. I got a 46x speedup using datetime.isoformat(). Other approaches got anywhere from 4x to 40x speedup, and a couple approaches were an order of magnitude slower than strptime(). My takeaway was there's no substitute for profiling the actual code you're running, and focusing on the specific bottlenecks in your own project. I wrote this up in a blog post if anyone's interested, "What's faster than strptime()?" https://ehmatthes.com/blog/faster_than_strptime/ |
|