Hacker News new | ask | show | jobs
by kettleballroll 1502 days ago
I always assumed spam filtering is a solved problem, imnevernhad any issues with eg protonmail once I've trained it on a significant body (eg all my current spam). Im curious, how many positive/negative samples have you used/how much time have you given the system to adapt?
1 comments

The last time I gave it a serious try, back in 2019, I gave it ~120000 non-spam samples (several years of real emails) and ~25000 spam samples (1 month of spam).

After that it was getting about 5% false-positive (so 1 in 20 real emails went to spam) and about 3% false-negative. For me, 3% false negative means 25 spams to inbox a day.

Gmail gives me about 0.5% false positive (1 in 200) and 0.01% false negatives.