| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by patrick_halina 2006 days ago
	RL is a good theoretical solution for personalization: given a user state, select an action that maximizes a long term reward (eg. revenue/engagement.) It’s tricky building the implementations because unlike Go/Chess/Atari it’s hard to simulate humans. So you have to train the agents with batches of data offline (ie. using historic data from the agent’s past actions.) This is challenging because you don’t get as many chances to try different hyper parameters. It’s starting to be used more in industry though.

1 comments

jmeister 2006 days ago

I’ve not kept up with the recent developments in this field - is Vowpal Wabbit widely used now? Any competitors? Or do people build their own in-house systems?

Thanks

link

flooo 2006 days ago

Vowpal Wabbit is used but many build something in-house into existing rule-based or supervised ml systems

In green fields deployments, Azure personalizer may be a nice place to start looking.

On the academic side, this paper provides an overview https://content.iospress.com/articles/data-science/ds200028

link

jmeister 2003 days ago

Thank you, the review looks helpful.

link