Hacker News new | ask | show | jobs
by bitL 2855 days ago
A few petabytes in some cases. Some advanced balanced sampling in Spark must be used for testing the models.