Hacker News new | ask | show | jobs
by pjsg 4049 days ago
If I am to invest in the time and energy to switch over my current anti-solution to this, then I want to ahve some level of assurance that it will be more effective than my current scheme.

I agree that spam is a moving target and that is why anti-spam systems need constant updating. My current system (over the last 30 days) rejected 87% (around 45k emails) and accepted 13%. Of that 13% (6600) around 300 were classified as spam by the bayesian classifier in thunderbird. Around 80 were manually classified as spam and added to thunderbird's rules. The thunderbird classifier probably classified 2 ham messages as spam. I don't know of any ham->spam errors in the initial filtering phase.

Should rspamd be expected to do better, about the same, or worse?

2 comments

From what you are saying, I can conclude that you are using very high scoring for statistical classifier (or basing solely on statistics). This is not an option for a system with millions of users (their accept/reject rate is close to 70/30 percents, as we cannot rely on bayes at all). Therefore, I've never ever evaluated bayes as a single classifier. Nevertheless, I'm using OSB-Bayes as a statistical algorithm for rspamd which has been proven to be a good classifier.
For similar systems (ie small but doing good manual classification when all else fails) I suspect that if more used razor (or again, similiar) we'd achieve better results (razor allows for sharing this data automatically)