Hacker News new | ask | show | jobs
by joshuamorton 2960 days ago
>because I feel like that adds state dependent on t to the bandit scenario, which then feels more like MDP.

Yeah this was sort of exactly the issue I was running into. I can't justify it to myself without essentially saying "this is just an MDP in disguise", which maybe is the right way to do it. I'm pretty sure you can define a k-armed bandit as an MDP on a single state, where each action corresponds to a machine, and all actions return you to the single state.

So maybe that is the right motivation. But reversing that "an MDP is just a k-armed bandit problem where sometimes playing a machine breaks it and forces you to play other machines, which can impact how quickly the casino fixes your first machine..." feels forced.

All that said, its a good article :)