|
|
|
|
|
by vii
2097 days ago
|
|
Setting a good objective function is pretty hard. In this context of consumer goods, it is at the intersection of three difficult problems: - equivalent to incentivising salespeople, which is known to be very difficult, as short term incentives often are in opposition to long term ones - distinguishing and dealing with spammers, robots and crawlers - and setting up a stable reinforcement learning behaviour even for the short term, which is tough even without the first two problems For these reasons, naturally business partners, designers, and others will be very curious how the bandit affects the customer experience. Many years ago to solve this I made a system that would emit a list of (suboptimal) rules to exploit the opportunities learnt from small A-B test groups (like an epsilon greedy contextual bandit). These rules were reviewed by relevant stakeholders and then explicitly deployed to production as a configuration change, which allows for manual consideration of issues in the three above areas that are hard to automate. |
|
I really like the idea of models producing functions over values, thanks for sharing that insight.