Interesting idea - depends on how you look at performance, right? There’s engineering overhead of setting up a MAB and it likely requires the skill set of a data scientist or ML engineer to interpret - a scarce resource in practice. Making it a less ideal approach for most bootstrapped teams.
From the accuracy point of view - With the MAB approach, you’d still be at risk of ignoring changes over time (or at least being slower to catch changes because of the historical bias) - meaning MAB may not be best suited for understanding the impact over longer periods of time like this approach would.
i.e., does a bandit represent the upper performance limit of ignoring the distinction between causation and correlation?