Hacker News new | ask | show | jobs
by tzs 4322 days ago
It's actually pretty easy and quick to write when the data is highly regular (no pun intended!). You can write this kind of expression by taking one of the lines of input:

   {"n":"Homewood","i":"inns_suits","p":[33.455237,-86.81964],"s":"AL","c":"1"},
and then making a regular expression that matches that line literally [1]:

   m/^{"n":"Homewood","i":"inns_suits","p":\[33.455237,-86.81964\],"s":"AL","c":"1"},/
     _                                     _                    _
Then replace the parts that will vary with regular expressions to capture them. We want to capture the "n" field:

   m/^{"n":"(.*?)","i":"inns_suits","p":\[33.455237,-86.81964\],"s":"AL","c":"1"},/
            _____
and the "i" field:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[33.455237,-86.81964\],"s":"AL","c":"1"},/
                        _____
and the longitude and latitudes from the "p" field:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[(.*?),(.*?)\],"s":"AL","c":"1"},/
                                     _____ _____
and the "s" field:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[(.*?),(.*?)\],"s":"(.*?)","c":"1"},/
                                                        _____
We don't care about the "c" field, so I'm going to drop it:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[(.*?),(.*?)\],"s":"(.*?)"/
If we want to be fancy, we can make sure that the latitude and longitude consist only of digits, decimal points, and minus signs:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[([\d.-]*?),([\d.-]*?)\],"s":"(.*?)"/
                                       ____       ____
For a one time thing like this, I'd probably deal with this data with a pipe in the shell, rather than use regular expressions:

   tr : , < in | tr -d '[]' | cut -d , -f 2,4,6,7,9 > out.csv
 
[1] I shall use Perl regular expression