|
|
|
|
|
by sytelus
2792 days ago
|
|
One of the most interesting part of the paper is how RL is used - especially here for Horizon where one of the goal seems to be problems where simulation isn't available. One such problem is push notifications: Historically, we have used supervised learning models
for predicting click through rate (CTR) and likelihood
that the notification leads to meaningful interactions. We introduced a new policy that uses Horizon to train a
Discrete-Action DQN model for sending push notifications
to address the problems above. The Markov Decision Process
(MDP) is based on a sequence of notification candidates
for a particular person. The actions here are sending and
dropping the notification, and the state describes a set of features about the person and the notification candidate. There
are rewards for interactions and activity on Facebook, with
a penalty for sending the notification to control the volume
of notifications sent. The policy optimizes for the long term
value and is able to capture incremental effects of sending
the notification by comparing the Q-values of the send and
don’t send action. |
|