Hacker News new | ask | show | jobs
by unscrupulous_sw 2450 days ago
Data laundering

This means crawling or using illegally obtained datasets then processing it with "machine learning" until you have enough plausible deniability to use it.

This could be used for bypassing copyrights. For example you can remix stock photos you don't want to pay for. You can crawl a competitor's dating network to build similar looking fake profiles. You can steal writings and automatically paraphrase it. You can steal algorithms by cloning their inputs/outputs. You can generate new porn by swapping faces and background.

An illegal dataset can also be used as a hidden input to improve your core product. For example you can buy up all stolen databases and logs and correlate the users. This can then be used for better ad targeting using data that isn't even available to google and facebook.

6 comments

One area I've heard of this being used is when companies buy lists of email addresses of people in certain industries (which is usually not illegal per se) and then upload them to Facebook's advertising platform to do targeting (rather than spamming them directly).
If its illegal I'd consider that highly regulated. I never considered that this was even a thing though so interesting post.
Yes! I run a data science department that's involved with digital advertising. We have client files that are restricted and regularly destroyed. Sometimes I wonder how many competitors use those files to build better ad targeting systems.
Nice name for it.
What you re describing sounds like taking photos on the street. I hope thats never regulated
Are there any public examples of this?