Hacker News new | ask | show | jobs
by tech_ken 1055 days ago
How does an typical NHST A/B system resolve this?
1 comments

You tend to stop while the experiment is running, and then spend time looking at the results once it's done.

The real benefits here are getting a better understanding of what levers drive your product metrics, as you'll inevitably mess up the first n or so experiments (if I could give you only one piece of advice, it would be to use stratified randomisation, but everyone seems to have to make this mistake for themselves).

Advice appreciated but I'm exceedingly familiar with experimental design haha, what I understand far less well though is the integration of the toolset into a business/product development context. I can see how having a staggered cadence of stopping, reflecting on the experimental design, and making a decision is wise. But it still seems that you could perform the experiment using MAB to keep the profit motive happy (you don't want to waste potential click-throughs just because you like p-values, maybe tune it to be more conservative about shifting heavily to one arm) and then have some period where you stop the experiment to pause and reflect.

Heck you don't even have to do MAB if you don't want to, just don't use NHST. The Bayesian "flavor" of NHST (credible intervals around posterior expected values) has absolutely no problem with optional stopping. Run the experiment until you've got a precise enough estimate, then sit back and make your product decisions.

I guess where I'm going with all this is that it seems like the post's strongest point is "good product decisions require time, and realtime analytics bamboozle us into thinking fast decisions are better". All the stuff about NHST seems kind of tangential. Looking at it again I see that it's like a decade old, so I think this is the best explanation for why they were targeting NHST more aggressively. I would hope in our post-replication crisis world (hopefully "post", anyways) data scientists and A/B testers are more prudent about some of these better-known pitfalls.