In 2014 I wrote an article on why Optimizely's approach to AB testing was statistically flawed [1]. I was working at a competitor so I needed to be a bit circumspect.
It's a bit breathtaking how basic the statistics knowledge is to make this critique (not complaining about the critique, mind you). I'm startled that this was coming as news to anyone.
Have you ever given thought to generalizability in A/B tests? I'm surprised there isn't more of a developing science of constructs that generally work...
Optimizely switched to a Frequentist statistics model in 2015, which changed how pretty much all testing companies do stats. Your article was valid, for a full year.