|
|
|
|
|
by chimi
2397 days ago
|
|
You're touching on the "difficulty" in verbalizing it. I see what you mean, because you did learn that the heuristic was changing with just a yes or no. I said you can't teach that way, but you clearly learned that way, so I wasn't exactly correct, but I'm not practically wrong either still I don't think. I wonder, how would an AI perform on the same test. What is the mathematical minimum number of questions on such a test, subsequent to the heuristic change, that could guarantee that new heuristic has been learned? I'm curious about the test. Did it have a name? What were they testing you for? |
|
This situation is called Multi-armed Bandit. In this setup you have a number of actions at your disposal and need to maximise rewards by selecting the most efficient actions. But the results are stochastic and the player doesn't know which action is best. They need to 'spend' some time trying out various actions but then focus on those that work better. In a variant of this problem, the rewards associated to actions are also changing in time. It's a very well studied problem, a form of simple reinforcement learning.