Hacker News new | ask | show | jobs
by x3blah 2194 days ago
Instead of using Python, here is a solution that only requires sh, curl, sed, sort, uniq and grep.

This solution uses a generous 87s delay to retrieve the Amazon pages. There are 328 films listed as "great movies" on rogerebert.com. As such, the script, named "1.sh", needs 8h to complete, e.g., the time while you are at work or sleeping. No cookies, no state, no problems.

   Usage: sh -c 1.sh > 1.html
Open 1.html in a browser and it shows whether each "great movie" is available as Prime Video or whether it is only available in some other format, such as Blu-ray, DVD, Multi-format, Hardcover. A link to the item on Amazon is provided.

   #!/bin/sh

   curl -HUser-Agent: -H'Accept: application/json' --compressed 'https://www.rogerebert.com/great-movies/page/[1-16]?utf8=%E2%9C%93&filters%5Btitle%5D=&sort%5Border%5D=newest&filters%5Byears%5D%5B%5D=1914&filters%5Byears%5D%5B%5D=2020&filters%5Bstar_rating%5D%5B%5D=0.0&filters%5Bstar_rating%5D%5B%5D=4.0&filters%5Bno_stars%5D=1'|grep -o "/reviews/great-movie-[^\\]*"|sed 's/.reviews.great-movie-//'|sort|uniq|while read x;do y=$(echo $x|sed 's/-/+/g');echo $x;curl -s --compressed -HUser-Agent: https://www.amazon.com/s/?k=$y 2>/dev/null|grep -m1 -C4 a-link-normal.a-text-bold;sleep 87;done|sed '/^[^< ]/s/.*/@&,/;1s|.*|<base href=https://www.amazon.com />&|;s/ *//;/^$/d;/^[@<]/!s|$|</a>|;1s/@//;s/@/<br>/'