Hacker News new | ask | show | jobs
by rgarcia 4683 days ago
I used to use the network tab for stuff like this, but now I almost exclusively use mitmproxy[0]. Once things get sufficiently complicated, the constant scrolling and clicking around in the network tab feels tedious. Plus it's difficult to capture activity if a site has popups or multiple windows. mitmproxy solves these problems and also has a ton more features like replaying requests and saving to files. My ideal tool involves something that translates mitmdump into code that performs the equivalent raw HTTP requests (e.g. using python's requests). Sort of like Selenium's IDE but for super lightweight scraping.

[0] http://mitmproxy.org/

1 comments

mitmproxy sounds like a lot of overhead if all you want is your own raw HTTP traffic. You can get this without Python, and without mitmproxy. Also, I thought mitmproxy was intended for HTTPS. Even in that case, I'm not sure installing Python and mitmproxy is necessary if all you want is to view your own traffic. You can just run your own CA and a proxy that can terminate SSL (e.g., haproxy).

Below is a simple, _lightweight_ ngrep solution. RE means a regular expression. This only saves packets with the RE you specify and does not save full packets, only the HTTP headers. 1024 is an arbitrary size to get all HTTP headers; adjust to taste. tcpdump is there only because ngrep does not work well with PPPoE. If you don't use PPPoE you don't need to include tcpdump.

     case $# in
     1)
     # capture HTTP headers to pcap file
     tcpdump -Ulvvvns1024 -w- tcp 2>/dev/null \
     |ngrep -O$1 -qtWbyline 'GET|POST|HEAD' >/dev/null 
     ;;
     2)
     # search HTTP headers in pcap file
     ngrep -Wbyline -qtI$1 $2
     ;;
     *)
     echo usage: $0 pcap-file \[RE\]
     esac
To dump your results, try

     $0 pcap-file . |less
And here's a little script to make URL's from your pcap file. unvis just decodes URL's from the specs in RFC's 1808 and 1866. It assumes http:// URL's (no ftp://). The awk script ensures all URL's (not just consecutive ones) are unique.

    case $# in
    [12])
    above-script $1 ${2-.} \
    |sed -n '
    /GET/p;
    /Host: /p;
    '  \
    |tr '\012' '\040' \
    |sed 's/GET/\
    &/g' \
    |awk '
    !($0 in a){a[$0];print "http://"$5$2}
    ' \
    |sed '
    s/%25/%/g;
    s/\.\//\//;
    ' \
    |unvis -hH \
    |sed '/^http:[/][/]./!d; 
    s/ /%20/g' \
    ;;
    *)
    echo usage: $0 pcap-file \[RE\] >&2
    esac
It's trivial to dump HTTP. You can feed this to netcat (using sed to modify the HTML to your liking), then open the result in your browser. Whatever you are aiming to do (I'm still not exactly sure - can you give an example?), I reckon it can be automated without Python and heaps of libraries.
Wow I thought mitmproxy looked rough until I saw tcpdump/ngrep/awk? They both work of course but neither look especially easy to use.

We've been using http://www.charlesproxy.com/ for years, great tool (cheap albeit not free)

The parent asked for "lightweight". Have you considered the size of the charlesproxy binaries?