| HN Mirror

That's why I wouldn't worry about how it performs in general, but in edge cases. The question isn't whether it's memorizing the whole dataset, but whether it's "memorizing" any particular points it shouldn't. Kinda like when you do a polynomial regression and the ends go more wild than the middle. The predictions in different parts of the space have different variances, some determined more strongly by single data points.

I have no doubt that in the vast majority of the email space, this will do great, but wonder will it leak privacy anywhere at all?