Hacker News new | ask | show | jobs
by novium 3576 days ago
On the topic of tracking emails that are "publicly posted on the web", how are you then making sure that the emails have been published with consent and don't originate in sources like database dumps etc.? As it does still sound somewhat unethical to use emails that aren't published with consent.
1 comments

Usually you'll see someone release site data, such as a database dump, via sites like pastebin. Either that or the release is zipped and hosted somewhere for download. We're most likely going to have crawler skip potentially risky sites, such as pastebin. Our crawler also will not have the ability to download and view actual files.

Also, we are probably going to implement specific pages that allow anyone to delist their domain and/or email address from our service.