Hacker News new | ask | show | jobs
by jvans 1013 days ago
Building your own bayesian model with something like pymc3 is also a very reasonable approach to take with small data or data with too much variance to detect effects in a timely manner. This also forces you to think about the underlying distributions that generate your data which is an exercise in itself that can yield interesting insights.
3 comments

[Author here] Heh - yes but don't, though...

Yes: you could use bayesian priors and a custom model to give yourself more confidence from less data. But...

Don't: for most businesses that are so early they can't get enough users to hit stat-sig, you're likely to be better off leveraging your engineering efforts towards making the product better instead of building custom statistical models. This is nerd-sniping-adjacent, (https://xkcd.com/356/) a common trap engineers can fall into: it's more fun to solve the novel technical problem than the actual business problem.

Though: there are a small set of companies with large scale but small data, for whom the custom stats approaches _do_ make sense. When I was at Opendoor, even though we had billions of dollars of GMV, we only bought a few thousand homes a month, so the Data Science folks used fun statistical approaches like Pair Matching (https://www.rockstepsolutions.com/blog/pair-matching/) and CUPED (now available off the shelf - https://www.geteppo.com/features/cuped) to squeeze a bit more signal from less data.

I love fitting models.

I always say in my profession I will fit models for free, it’s having to clean data and “finish” a project that I demand payment.

...and pictures in a format the journal likes....
That works for a website. Doesn't work as well for direct mail