This is the most important critique of A/B testing. It far outweighs the traditional hoopla about simultaneous inference and Bonferonni corrections.
Epsilon greedy does well on k-armed bandit problems, but in most applications you likely can do significantly better by customizing the strategy to individual users. That's a contextual bandit and there are simple strategies that to pretty well here too. For instance: