Hacker News new | ask | show | jobs
by vii 2097 days ago
Setting a good objective function is pretty hard. In this context of consumer goods, it is at the intersection of three difficult problems:

- equivalent to incentivising salespeople, which is known to be very difficult, as short term incentives often are in opposition to long term ones

- distinguishing and dealing with spammers, robots and crawlers

- and setting up a stable reinforcement learning behaviour even for the short term, which is tough even without the first two problems

For these reasons, naturally business partners, designers, and others will be very curious how the bandit affects the customer experience.

Many years ago to solve this I made a system that would emit a list of (suboptimal) rules to exploit the opportunities learnt from small A-B test groups (like an epsilon greedy contextual bandit). These rules were reviewed by relevant stakeholders and then explicitly deployed to production as a configuration change, which allows for manual consideration of issues in the three above areas that are hard to automate.

2 comments

Producing a set of impactful decision boundaries as functions and then manually curating the functions reminds me how much work maintaining rule based systems can be. Moreover, so much time is spent on figuring out which rules might be helpful in the first place - this being partly what makes rule based systems traditionally brittle (It takes far longer to evolve the rule-based system than to work around the rules).

I really like the idea of models producing functions over values, thanks for sharing that insight.

I love this idea. Is it written somewhere you can share?