Hacker News new | ask | show | jobs
by eesmith 639 days ago
I assumed it was email messages, not email addresses, and the de-duplication was for things like mass mailings, spams, and the like.

(As one example, strip tracking urls and web bugs to identify that two messages with different bytes are auto-generated from a unique template.)

With email messages it would be possible to train based on interests, for example to get a generative AI to create more targeted phishing messages.