One trick I used myself in a bayesian bandit-esque approach (thompson sampling from some distribution, eg. a Beta distribution) was to implement some "forgetting rate" on the parameters of the distribution.
I updated the post to note Thompson Sampling and added a note about using only recent data as a potential improvement and linked to this comment. Thanks for this note!