Hacker News new | ask | show | jobs
by 0x0 4676 days ago
I wonder if this could be improved by just using the standard C library strftime(3) instead of going through sqlite?
2 comments

I was wondering the same thing since that's also what apple recommends in situations like these. This is what I got on the same hardware:

strptime_l took 58.803 seconds

NSDateFormatter took 107.570 seconds

sqlite3 took 7.022 seconds

And with MishraAnurag's suggestion of using timegm instead of mktime:

strptime_l took 21.656 seconds

NSDateFormatter took 108.163 seconds

sqlite3 took 7.096 seconds

Why not see what sqlite is doing and do something in C yourself that solves the actual problem. It's not surprising that a general purpose Obj-C (or any language) class isn't terribly fast at one specific thing.
Yeah that would probably be the way to go ultimately if you're doing a lot of date parsing, I agree!
Are you using mktime to get the unix timestamp? That might be the slower part as opposed to strptime.
I am, source is here: https://gist.github.com/jurre/6475263

My c is quite poor so if you have any suggestions on how to improve I'd love to hear them!

I'd suggest using timegm instead of mktime, or set the TZ environment variable to UTC to ensure all implementations return an identical date. I ran the same tests and found that the strptime was quite fast, but gmtime was taking most of the time. To speed that up, you could borrow SQLite's implementation. Checkout the computeJD function from SQLite's date.c - http://www.sqlite.org/src/doc/trunk/src/date.c
timegm actually already makes a huge difference, thanks! Might be useful to make a small fast date parsing library based on the sqlite source code.
If date formatting is a bottleneck for me (it is surprisingly often, because it's very slow in some languages) I typically just run it through the command-line program 'convdate' [1] from crush-tools, which is more or less just a wrapper around strptime+strftime.

[1] https://code.google.com/p/crush-tools/wiki/ConvdateUserDocs

If it is a bottleneck shelling out is not a great solution...
Shelling out for each piece of data is indeed not great.

Shelling out for batch-processing loads of data is on the other hand great.

Yes, and that's the workload convdate is intended for: it batch-converts an entire column of a tab-delimited file. The larger crush-tools suite is intended for unix-style batch processing of tabular data, but fills in some functionality that the classic set of POSIX tools (cut, sort, paste, join, etc.) didn't cover.