| HN Mirror

For more interesting problems than "which country has the highest GDP?", it's about more than just sloppy data. If you want to include any covariates, how do you know which ones to include? You could try to include everything predictive, but then you'll use the client margin column to predict client revenue or something. Or you'll control for a column causally downstream, biasing your estimates, like estimating revenue differences and controlling for page views in an experiment that affects page views. There's so much that we just don't include in our databases that's crucial to using them, and it's not just about sloppiness.