Hacker News new | ask | show | jobs
by DirtyAndy 4323 days ago
I would assume because PostgreSQL (the DB he is using) doesn't support importing from JSON?
2 comments

But why not use a JSON parser?
The dude had a whimsical idea, tried it out, and wrote it up for our enjoyment. Let it go. Maybe he will parse the JSON next time.
Next you're going to say you didn't build Pinboard with Angular, or something crazy like that.
if you're driving a nail into soft wood, you could use a hammer... Or you could just use a rock, or another piece of wood, or....

Using the wrong tool for a really easy job can sometimes be faster than the minimal effort of getting the right tool ready.

The regexp must have taken some time to write and debug:

s/^."n":"(.+)","i":"(.+)","p":\[([\d\.\-]+),([\d\.\-]+)],"s":"(\w+)","c":"(.+)".$/\1,\2,\3,\4,\5/;

It's actually pretty easy and quick to write when the data is highly regular (no pun intended!). You can write this kind of expression by taking one of the lines of input:

   {"n":"Homewood","i":"inns_suits","p":[33.455237,-86.81964],"s":"AL","c":"1"},
and then making a regular expression that matches that line literally [1]:

   m/^{"n":"Homewood","i":"inns_suits","p":\[33.455237,-86.81964\],"s":"AL","c":"1"},/
     _                                     _                    _
Then replace the parts that will vary with regular expressions to capture them. We want to capture the "n" field:

   m/^{"n":"(.*?)","i":"inns_suits","p":\[33.455237,-86.81964\],"s":"AL","c":"1"},/
            _____
and the "i" field:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[33.455237,-86.81964\],"s":"AL","c":"1"},/
                        _____
and the longitude and latitudes from the "p" field:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[(.*?),(.*?)\],"s":"AL","c":"1"},/
                                     _____ _____
and the "s" field:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[(.*?),(.*?)\],"s":"(.*?)","c":"1"},/
                                                        _____
We don't care about the "c" field, so I'm going to drop it:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[(.*?),(.*?)\],"s":"(.*?)"/
If we want to be fancy, we can make sure that the latitude and longitude consist only of digits, decimal points, and minus signs:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[([\d.-]*?),([\d.-]*?)\],"s":"(.*?)"/
                                       ____       ____
For a one time thing like this, I'd probably deal with this data with a pipe in the shell, rather than use regular expressions:

   tr : , < in | tr -d '[]' | cut -d , -f 2,4,6,7,9 > out.csv
 
[1] I shall use Perl regular expression
I've been a Perl programmer and regular expression user for over a decade. It didn't take long and if you use them regularly, it's second nature. I agree, I could have used a JSON parser, but the source was well-structured and sed seemed easier. PostgreSQL can import JSON, but I haven't had a good experience with it so far.
You put together a really awesome project! Kudos on that and for sharing it first and foremost!! I speak regex pretty fluently, and I may very well have reached for sed/perl -ne there as well, depending on mood. I guess it just struck me as odd to use a parser for XML but not JSON. Anyhow, off on a tangent, I recently learned about a program called xgrep that lets you pull elements out of XML using xpath or pcre (althought pcre support was kind of spotty for me)--it's neat for one-offs like this and pipeline building while playing with rest apis and such!
He could also have had that sitting around from a recent project.. it doesn't really seem worth second-guessing.
Maybe, if you're doing it all at once. I often use my text editor as a regex platform, using the Find/Replace menu to dice off chunks of text in a series of quick and easy operations, using UNDO if I get one wrong. It's really very quick if you don't force yourself to do it in one step.
I guess, but honestly using a JSON parser sounds like less work. I mean, using regexes for this quick job is fine, it just sounded weird.
On the one hand I agree with the "It doesn't matter here" crowd, but on the other hand, Python (only language mentioned in the article) has a json library in the stdlib, and it's certainly easier than doing it with regexes.

So, it doesn't matter much, but it was an odd choice.

Even if it's in the standard library, it still might be more work to find where it is in the standard library than just do it the way you already know how.
Agreed... It's a devil you know vs. the "oh I'll just use this other library to do $simple_thing, how hard can it really be?" Hours later you realize that you didn't think it completely through and it's a bit tougher than you thought.
>it's certainly easier than doing it with regexes.

... for you.

It depends on the version. More recent releases (that is, >= 9.2) do have native JSON support, but it's still young — though steadily improving.
Yup, I've done a bit with 9.2, and it accepts JSON just fine (although not with automatic string conversion) but it's a short road to wishing the data were in standard tables and columns.
Use dollar-quoting for automatic string conversion:

insert into my_table values ($$ json_goes_here $$);

Works for all stringy things.

Cool, thanks. Last time around I used a pg-driver-specific datatype to hold the parameter on its way through sql2o, and felt rather guilty about doing so.