Hacker News new | ask | show | jobs
by bitmedley 2432 days ago
Encoding your email address to hexadecimal may prevent some less sophisticated crawlers from capturing your email address.

mailto: -> mailto:

abc@gmail.com -> abc@gmail.com

Then instead of:

<a href="mailto:abc@gmail.com">abc@gmail.com</a>

Use this:

<a href="mailto:&#097;&#098;&#099;&#064;&#103;&#109;&#097;&#105;&#108;&#046;&#099;&#111;&#109;"> &#097;&#098;&#099;&#064;&#103;&#109;&#097;&#105;&#108;&#046;&#099;&#111;&#109;</a>

Or this ("mailto:" also encoded):

<a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#097;&#098;&#099;&#064;&#103;&#109;&#097;&#105;&#108;&#046;&#099;&#111;&#109;"> &#097;&#098;&#099;&#064;&#103;&#109;&#097;&#105;&#108;&#046;&#099;&#111;&#109;</a>

source: http://www.wbwip.com/wbw/emailencoder.html

1 comments

Wouldn't any decent HTML library, probably already used by the crawler, convert that back to plain text?
If you're crawling the web looking for email addresses you're probably not bothering to parse the HTML. You don't need to: you can just grab the email from the raw response from the web server, along with any new links to follow.