Hacker News new | ask | show | jobs
by msantos 4446 days ago
A few webcrawlers[1] out there follow HTTP redirect headers and ignore the change in schemas (this method is different of OP's but achieves the same goal).

So anyone can create a trap link such as

    <a href="file:///etc/passwd">gold</a>
Or

   <a href="trap.html">trap</a> 
once trap.html is requested the server issues a header "Location: file:///etc/passwd"

Then it's just a matter of seat and wait for the result to show up wherever that spider shows its indexed results.

[1] https://github.com/scrapy/scrapy/issues/457