| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jsemrau 2002 days ago
	That's one of these moments in life where you see tech and you know it will change the world, but don't see the problem yet.

2 comments

patrick_halina 2002 days ago

RL is a good theoretical solution for personalization: given a user state, select an action that maximizes a long term reward (eg. revenue/engagement.) It’s tricky building the implementations because unlike Go/Chess/Atari it’s hard to simulate humans. So you have to train the agents with batches of data offline (ie. using historic data from the agent’s past actions.) This is challenging because you don’t get as many chances to try different hyper parameters. It’s starting to be used more in industry though.

jmeister 2002 days ago

I’ve not kept up with the recent developments in this field - is Vowpal Wabbit widely used now? Any competitors? Or do people build their own in-house systems?

Thanks

flooo 2002 days ago

Vowpal Wabbit is used but many build something in-house into existing rule-based or supervised ml systems

In green fields deployments, Azure personalizer may be a nice place to start looking.

On the academic side, this paper provides an overview https://content.iospress.com/articles/data-science/ds200028

jmeister 1999 days ago

Thank you, the review looks helpful.

vojta_letal 2002 days ago

Does world really work like that?

jsemrau 2002 days ago

When the first PC with Basic launched in the 80s many people wanted to develop for it.

When the iPhone Appstore launched, many people started to build apps in the ecosystem.

While it might be it bit too early to compare RL to those advances in technology. I personally feel there is huge potential. I might be wrong though. And I am fine with that.

bitL 2002 days ago

RL needs a supercomputer and its code is usually too fragile - making a trivial mistake anywhere (missing a constant multiplication, swapping the order of two consecutive lines of code etc.) would likely lead to your model never converging even if you got everything else right.

chasely 2002 days ago

The hard part of RL for the problems I've encountered in my work is that you need a simulator. Building a reliable and accurate simulator is often an immense undertaking.

dgb23 2002 days ago

Maybe data scientists should team up (more?) with game programmers. They have a ton of experience in building very complex simulations.

Ma8ee 2002 days ago

Which code is not fragile in that sense? I think that is a rather strange criticism.

Iv 2002 days ago

You can do RL on an raspberry pi. Depends what problem you are trying to solve but not all of them require video analysis and billions of parameters.

cbames89 2002 days ago

Technical point: Value functions that are a constant multiples of each other result in the same behavior.

bitL 2001 days ago

Making a constant multiplication mistake somewhere in the code doesn't imply the new value function would be a constant multiply of the optimal one.

bonoboTP 2002 days ago

RL isn't new though, the foundational results are about 25 years old.

WanderPanda 2002 days ago

And it feels a bit like it is stalling (at least in continuous control)

cbames89 2002 days ago

In my opinion there's a wide open array of approaches from control that can help with this. Learning for Control is a new conference that looks at this very topic.

stevofolife 2002 days ago

No one said "new". You can apply what you said to PC and iPhones. Mainframes and palms existed before them.

dmarchand90 2002 days ago

That's still very analogous to the first PCs. By that point there had been decades of foundational computer work