|
A regex only seems to take ~1µs. In [7]: iso_regex = re.compile('(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}(?:\\.?\\d+))')
In [8]: %timeit iso_regex.match('2014-01-09T21:48:00.921000')
1000000 loops, best of 3: 1.05 µs per loop
But hey, once it's written in C, why go back?I'm missing the timezone, but the OP left that out, so I did too. For comparison, dateutil's parse takes ~76µs for me. Kinda makes me wonder why aniso8601 is so slow. (It's also missing a few other things, depending on if you count all the non-time forms as valid input.) That said, cool! I might use this. One of the things that makes dateutil's parse slower is that it'll parse more than just ISO-8601: it parses many things that look like dates, including some very non-intuitive ones that have caused "bugs"¹. Usually in APIs, its "dates are always ISO-8601", and all I really need is an ISO-8601 parser. While I appreciate the theory behind "be liberal in what you accept", sometimes, I'd rather error out than to build expectations that sending garbage — er, stuff that requires a complicated parse algorithm that I don't really understand — is okay. ¹dateutil.parser.parse('') is midnight of the current date. Why, I don't know. Also, dateutil.parser.parse('noon') is "TypeError: 'NoneType' object is not iterable". |
* Every part from month onwards is optional
* Separator characters are optional
* Date/time separator can be a space as well as T
* Timezone information
* Parsing the strings into numbers
* Actually creates a datetime object
I expect adding all of those will bump up the time a bit.