| HN Mirror

I think you, as the GP, fall victim here of the approach of data analysis by Google. They probably have some analysis analyzing all users' spam markings and since too many other users do not mark it as spam, your marking does not weigh in much. You basically don't train the model enough. Just guessing though.