| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by data_scientist 4389 days ago

I tried to do a fair comparaison between comparaison between the main date implementations. The ciso8601 is really fast, 3.73 µs on my computer (MBA 2013). aniso8601, iso8601, isolate and arrow are all between 45 and 100µs. The dateutil parser is the slowest (150 µs).

  >>> ds = u'2014-01-09T21:48:00.921000+05:30'

  >>> %timeit ciso8601.parse_datetime(ds)
  100000 loops, best of 3: 3.73 µs per loop

  >>> %timeit dateutil.parser.parse(ds)
  10000 loops, best of 3: 157 µs per loop

A regex[1] can be fast, but the parsing is just a small part of the time spent.

  >>> %timeit regex_parse_datetime(ds)
  100000 loops, best of 3: 13 µs per loop

  >>> %timeit match = iso_regex.match(s)
  100000 loops, best of 3: 2.18 µs per loop

Pandas is also slow. However it is the fastest for a list of dates, just 0.43µs per date!!

  >>> %timeit pd.to_datetime(ds)
  10000 loops, best of 3: 47.9 µs per loop

  >>> l = [u'2014-01-09T21:{}:{}.921000+05:30'.format(
        ("0"+str(i%60))[-2:], ("0"+str(int(i/60)))[-2:]) 
     for i in xrange(1000)] #1000 differents dates
 
  >>> len(set(l)), len(l)
  (1000, 1000)

  >>> %timeit pd.to_datetime(l)
  1000 loops, best of 3: 437 µs per loop

NB: pandas is however very slow in ill-formed dates, like u'2014-01-09T21:00:0.921000+05:30' (just one figure for the second) (230 µs, no speedup by vectorization).

So if you care about speed and your dates are well formatted, make a vector of dates and use pandas. If you can't use it, go for ciso8601. For thomas-st: it may be possible to speed-up parsing of list of dates like Pandas do. Another nice feature would be caching.

[1]: http://pastebin.com/ppJ4dzBP