Hacker News new | ask | show | jobs
by bruce343434 1822 days ago
Between the emojis in the headings and the 2009 era memes, this was a bit of a cringy read. Also, the author seems to avoid at all costs going in depth about the actual implementation of OPE and I still don't quite understand how I would go about implementing it. Machine learning based on past A/B tests that finds similarities between the UI changes???
2 comments

Author here! I implemented every method I described in the post in the pip library I used in the post.

In case you missed it: https://github.com/banditml/offline-policy-evaluation

Yea me too.

My biggest question is where do you get user data to run the simulation? Take the simple push example - if to date you’ve only sent pushes on day 1, and you want to explore day 2,3,4,5 etc…where does that user response data come from? It seems like you need to get the data, then you can simulate various permutations of a policy. But then why not just run multi arm bandit?

The author discusses that in the post, you need to have allowed all possibilities to run at some probability.