|
|
|
|
|
by birken
4389 days ago
|
|
Pandas (data analysis library for python) has a lot of cython and C optimizations for datetime string parsing: They have their own C function which parses ISO-8601 datetime strings: https://github.com/pydata/pandas/blob/2f1a6c412c3d1cbdf56610... They have a version of strptime written in cython: https://github.com/pydata/pandas/blob/master/pandas/tslib.py... I'm not saying these are better/worse than your solution, I haven't done any benchmarks and the pandas functions sometimes cut a few corners, but perhaps there is something useful there for reference anyways. They also don't deal directly in datetime.datetime objects, they use pandas specific intermediate objects, but should be simple enough to grok. Having done some work with dateutil, I will tell you that dateutil.parser.parse is slow, but its main use case shouldn't be converting strings to datetimes if you already know the format. If you know the format already you should use datetime.strptime or some faster variant (like the one above). There is a nice feature of pandas where given a list of datetime-y strings of an arbitrary format, it will attempt to guess the format using dateutil's lexer (https://github.com/pydata/pandas/blob/master/pandas/tseries/...) combined with trial/error, and then try to use a faster parser instead of dateutil.parser.parse to convert the array if possible. In the general case this resulted in about a 10x speedup over dateutil.parser.parse if the format was guessable. |
|