Hey! I'm the author of Confidence.js. Emily Malcolm and I have been working hard on this new approach for the past few weeks and we're super excited to share it!
>First, we use Chi Squared Tests to determine if differences in the A/B test data are meaningful or not.
Could you explain why you've chosen to take a binary approach to determining whether differences are meaningful or not?
To me it seems like a continuous approach would be both more useful and more realistic. Creating an artificial threshold for significance seems a bit silly (and it also makes the model harder for users to use, because different applications might need different significance levels to justify an action).
From my perspective, every data point contains information and if you wait for significance you're essentially ignoring early information.
Edit: Also, when switching costs are small, significance levels become mostly pointless and you just want to switch to the best A/B option immediately. As evidence swings the other way, you just switch back.
Hi! This is Emily, the stats brains behind this. My research indicated that 80% significance was commonly used for A/B Testing (and of course any significance level >80% would be even more conservative). I definitely see your point of the advantage of a continuous test and I think for the expert being able to see the exact significance level for the test would be very useful. However, our thoughts were that the average user might not necessarily have the expertise to know what a "good enough" level of significance would be for any test. Rather than having to educate every user on what significance means and how to interpret it we decided that a "yes, significant" or "no, not significant" would be more easily interpreted by all, regardless of their statistical background. If there is demand for a more continuous approach, it certainly could easily be implemented.
Thanks for the reply. I guess it's a good reminder for everyone building statistical products (or even non-statistical products) that the goal is not an optimal formula, but a result that makes users comfortable.
Could you explain why you've chosen to take a binary approach to determining whether differences are meaningful or not?
To me it seems like a continuous approach would be both more useful and more realistic. Creating an artificial threshold for significance seems a bit silly (and it also makes the model harder for users to use, because different applications might need different significance levels to justify an action).
From my perspective, every data point contains information and if you wait for significance you're essentially ignoring early information.
Edit: Also, when switching costs are small, significance levels become mostly pointless and you just want to switch to the best A/B option immediately. As evidence swings the other way, you just switch back.