Hacker News new | ask | show | jobs
by Houshalter 3885 days ago
Overfitting requires "memorizing" the dataset, instead of generalizing it. I think that's very very unlikely. The neural network parameters can only store so many bits of information. But the dataset is millions of times bigger.
1 comments

That's why I wouldn't worry about how it performs in general, but in edge cases. The question isn't whether it's memorizing the whole dataset, but whether it's "memorizing" any particular points it shouldn't. Kinda like when you do a polynomial regression and the ends go more wild than the middle. The predictions in different parts of the space have different variances, some determined more strongly by single data points.

I have no doubt that in the vast majority of the email space, this will do great, but wonder will it leak privacy anywhere at all?