| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sdenton4 1829 days ago

Check out Thompson Sampling and multi-armed bandit problems for how this can work out in real life. (I tend to think this approach is much better than genetic algorithms...)

Each 'bandit' is a random boolean outcome, governed by some hidden success probability. Thompson Sampling trades of exploration and exploitation. If there's no successes ever, then all bandits are equally bad, and you just keep exploring randomly until you find some success (or give up). If you do have some success, you can try to exploit it.

For a problem with continuous parameters, you can discretize the parameter space by binning, and then choose randomly within a bin for each trial. 'Exploiting' a particular bin might lead to breaking it into more bins for finer resolution.