| HN Mirror

Try stratified sampling. Removing records without claims only increases the variance of the denominator which is much less variable. You actually can eliminate the majority of the data and find results that are the same to several decimal places. Note this only works with very large datasets without extremely high cardinality variables.

That said, 50000 is too few. For a dataset of this size, 20 million records is likely more reasonable. The actual answer depends on the variance of the individual predictors and their correlation with each other.