Hacker News new | ask | show | jobs
by sebazzz 2432 days ago
Wouldn't any decent HTML library, probably already used by the crawler, convert that back to plain text?
1 comments

If you're crawling the web looking for email addresses you're probably not bothering to parse the HTML. You don't need to: you can just grab the email from the raw response from the web server, along with any new links to follow.