| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by j7ake 872 days ago

Why are you using a Markov process though to model time-dependent likelihood pathways ?

Doesn’t make sense. Your next step depends on much more than just knowing where you are at S. One needs to account for the history of where you were before.

Or maybe you’re just using technical words with precise meanings to describe a vague imprecise heuristic?

2 comments

AndrewKemendo 872 days ago

> time-dependent likelihood pathways

Future reward trajectories are THE core focus of multi-step MDP, see Sutton [1]

"Now we consider transitions from state-action pair to state-action pair, and learn the value of state-action pairs. Formally these cases are identical: they are both Markov chains with a reward process. The theorems assuring the convergence of state values under TD(0) also apply to the corresponding algorithm for action values: "

I wasn't going to differentiate in my original post between sub-types of "cycles" within increasingly complex MDP's for long sequence reward estimation:

[1]http://incompleteideas.net/book/ebook/node64.html

link

j7ake 870 days ago

You’re just quoting from Sutton’s reinforcement learning book, which proposes a learning algorithm with a Markov process assumption.

Markov processes are nice because they are simple objects and therefore have nice properties and solid mathematical proofs.

Many mathematical models are studied because they have nice theoretical properties and one can prove theorems about them. This should not be mistaken with an actual mechanistic explanation for complex emergent phenomena like human decisions.

link

rramadass 872 days ago

Your question is valid. I think the person is just using bombastic words for something already well-known and simpler. A Markov Chain is just a FSM with probabilistic transition functions and in the limit is just a deterministic FSM when the transition function probability becomes 1.

link