| HN Mirror

There are often times when you have n possible providers of service y, each with strengths and weaknesses. If you have some ultimate truth signal (like follow on costs which are linked to quality, which was what I used) then you can model the providers as bandits and use something like UCB1 to choose which to use. If you then apply this to every individual customer what you end up doing is learning the optimal vendor for each customer which gives you a higher efficiency than had you picked just one 'best all around' vendor for all customers. So the pattern here is: If you have n_service_providers and n_customers and a value signal to optimize then maybe MAB is the place to go for some possible quick gains. Of course if you have a huge state space to explore instead of just n_service_providers, for instance you want to model combinations of choices, using something like a NN to learn the state space value function is also a great way to go.