Hacker News new | ask | show | jobs
by dai_pole 4378 days ago
I had a go "just for fun" using curl, grep, sed, and tr. Probably too much regex?

    #!/bin/sh
    #
    # tickets.sh - A "no BS" ticket price scraper. Output in CSV format.
    #              Uses standard issue Unix utilities only.
    #              No soup for you!
    
    
    URL="http://philadelphia.craigslist.org"
    QUERY="firefly+tickets"
    
    RESULTS=`curl -s -m 10 "$URL/search/sss?sort=date&query=$QUERY" \
            | grep '<p class=\"row' \
            | sed 's!^[ \t]*!!; \
                   s!>[ \t]*<!><!g; \
                   s![,:]! !g; \
                   s!<p class=\"row[^/]*\"\([^\"]*\)\" class=\"[^#]*\">&#x0024;\([0-9]\{1,\}\)</span>[^.]*>\([A-Z]\{1\}[a-z]\{2\} \{1,\}[0-9]\{1,2\}\)[^.]*<a h[^>]*\.html">\([^<]*\)</a>\([^.]*</p>\)!\1,$\2,\3,\4:!g; \
                   s!   *! !g; \
                   s!,  *!,!g' \
            | tr ':' '\n'`
    
    echo "$RESULTS"