| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tzs 4322 days ago

It's actually pretty easy and quick to write when the data is highly regular (no pun intended!). You can write this kind of expression by taking one of the lines of input:

   {"n":"Homewood","i":"inns_suits","p":[33.455237,-86.81964],"s":"AL","c":"1"},

and then making a regular expression that matches that line literally [1]:

   m/^{"n":"Homewood","i":"inns_suits","p":\[33.455237,-86.81964\],"s":"AL","c":"1"},/
     _                                     _                    _

Then replace the parts that will vary with regular expressions to capture them. We want to capture the "n" field:

   m/^{"n":"(.*?)","i":"inns_suits","p":\[33.455237,-86.81964\],"s":"AL","c":"1"},/
            _____

and the "i" field:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[33.455237,-86.81964\],"s":"AL","c":"1"},/
                        _____

and the longitude and latitudes from the "p" field:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[(.*?),(.*?)\],"s":"AL","c":"1"},/
                                     _____ _____

and the "s" field:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[(.*?),(.*?)\],"s":"(.*?)","c":"1"},/
                                                        _____

We don't care about the "c" field, so I'm going to drop it:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[(.*?),(.*?)\],"s":"(.*?)"/

If we want to be fancy, we can make sure that the latitude and longitude consist only of digits, decimal points, and minus signs:

   m/^{"n":"(.*?)","i":"(.*?)","p":\[([\d.-]*?),([\d.-]*?)\],"s":"(.*?)"/
                                       ____       ____

For a one time thing like this, I'd probably deal with this data with a pipe in the shell, rather than use regular expressions:

   tr : , < in | tr -d '[]' | cut -d , -f 2,4,6,7,9 > out.csv

[1] I shall use Perl regular expression