| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vii 2144 days ago

Setting a good objective function is pretty hard. In this context of consumer goods, it is at the intersection of three difficult problems:

- equivalent to incentivising salespeople, which is known to be very difficult, as short term incentives often are in opposition to long term ones

- distinguishing and dealing with spammers, robots and crawlers

- and setting up a stable reinforcement learning behaviour even for the short term, which is tough even without the first two problems

For these reasons, naturally business partners, designers, and others will be very curious how the bandit affects the customer experience.

Many years ago to solve this I made a system that would emit a list of (suboptimal) rules to exploit the opportunities learnt from small A-B test groups (like an epsilon greedy contextual bandit). These rules were reviewed by relevant stakeholders and then explicitly deployed to production as a configuration change, which allows for manual consideration of issues in the three above areas that are hard to automate.

2 comments

alextheparrot 2144 days ago

Producing a set of impactful decision boundaries as functions and then manually curating the functions reminds me how much work maintaining rule based systems can be. Moreover, so much time is spent on figuring out which rules might be helpful in the first place - this being partly what makes rule based systems traditionally brittle (It takes far longer to evolve the rule-based system than to work around the rules).

I really like the idea of models producing functions over values, thanks for sharing that insight.

link

sanj 2144 days ago

I love this idea. Is it written somewhere you can share?

link