Hacker News new | ask | show | jobs
by ajanuary 4389 days ago
The library has the following features your regex is missing:

* Every part from month onwards is optional

* Separator characters are optional

* Date/time separator can be a space as well as T

* Timezone information

* Parsing the strings into numbers

* Actually creates a datetime object

I expect adding all of those will bump up the time a bit.

1 comments

I'm not much of a regex wizard, but I tried to add all the features listed other than parsing the result and creating the datetime object.

    iso_regex = re.compile('([0-9]{4})-?([0-9]{1,2})(?:-?([0-9]{1,2})(?:[T ]([0-9]{1,2})(?::?([0-9]{1,2})(?::?([0-9]{1,2}(?:\\.?[0-9]+)?))?(?:(Z)|([+-][0-9]{1,2}):?([0-9]{1,2})))?)?)?')
It seems like it performs quite a bit worse than the library, which creates the full object.

    In [82]: %timeit ciso8601.parse_datetime('2014-01-09T21:48:00.921000')
    1000000 loops, best of 3: 368 ns per loop

    In [83]: %timeit iso_regex.match('2014-01-09T21:48:00.921000')
    100000 loops, best of 3: 9.72 µs per loop
In the interest of intellectual pursuit, is there anything that can be done to the regex to speed it up?