Hacker News new | ask | show | jobs
by greenone 3057 days ago
to clarify:

- marketing wants some tracking, some developers adds it

- ecommerce websites in the real world tend to "need" these tracking/conversion codes

- you do have legitimate get-requests like password-reset links with tokens, also we do use payment providers who send the customers back to us with get links which include payment tokens, newsletter-unsubscribe links are also often simple token links

- and yes normally a get-request should not change anything (at least not when its just repeated) but the sheer fact that they have access to it _and_ are crawling it is bad

my point being that I find it that they would just crawl everything they recorded instead of just crawling pages which are linked publicly or which are targeted in ad-campaigns combined with the fact that they don't warn you about it

1 comments

> my point being that I find it that they would just crawl everything they recorded instead of just crawling pages which are linked publicly or which are targeted in ad-campaigns

There's no way to know which pages are linked publicly without crawling every page for links. So you're right back at square one.

Ultimately if it's on a Internet-facing web server and not hidden behind an IP whitelist or secure login function then you have to assume it is public. All you are arguing is about different degrees of "public" which somewhat misses the real issue of website security.

Some crawlers do deliberately hit random URLs to check how you're handling 404s. Over crawlers are entirely dishonest and will try to find content that wasn't intended to be made public. How are you going to handle them if you're stumped with the Facebook crawlers that you invited onto your site?

> ...combined with the fact that they don't warn you about it

It's pretty obvious behavior in my opinion but maybe they could have been more explicit. However going back to my previous point, no other crawler advertises what it's going to crawl beforehand. So where do you draw the line? Ranting that Google indexed your site? What about visitors buying stuff on your ecommerce package without prior communication requesting access to the site?

You wouldn't ask customers in a bricks-and-mortar store to state their intentions the moment they walked through the shop door so why should every HTTP user agent have to do the same? While web security can be both complex and maddening, responsibility of hardening the site is still yours; not Facebook's.