Hacker News new | ask | show | jobs
by tyingq 3356 days ago
It has the look of using regex against raw page html.

I would guess you'd have better luck parsing the html and extracting the href attributes of any <link> tags, src attributes of <script> tags, etc. Then pattern matching only against that.