Hacker News new | ask | show | jobs
by patrick_halina 2006 days ago
RL is a good theoretical solution for personalization: given a user state, select an action that maximizes a long term reward (eg. revenue/engagement.) It’s tricky building the implementations because unlike Go/Chess/Atari it’s hard to simulate humans. So you have to train the agents with batches of data offline (ie. using historic data from the agent’s past actions.) This is challenging because you don’t get as many chances to try different hyper parameters. It’s starting to be used more in industry though.
1 comments

I’ve not kept up with the recent developments in this field - is Vowpal Wabbit widely used now? Any competitors? Or do people build their own in-house systems?

Thanks

Vowpal Wabbit is used but many build something in-house into existing rule-based or supervised ml systems

In green fields deployments, Azure personalizer may be a nice place to start looking.

On the academic side, this paper provides an overview https://content.iospress.com/articles/data-science/ds200028

Thank you, the review looks helpful.