|
|
|
|
|
by m12k
1816 days ago
|
|
I think a major takeaway here is that balancing a reward system to reward more than a single behavior is really hard - it's easy to tip the scales so one behavior completely dominates all others. It's an interesting lens to use to look at the heuristic reward system humans have built in (hunger, fear, desire, etc). This tends to have an adaptation/numbing effect, where repeated rewards of the same type tend to have diminishing returns, and that makes sense because it protects against "gaming the system" and going for one reward to the exclusion of all others. |
|
Genetic Algorithms attempt to use this same system over extremely simple "fitness landscapes," where the fitness of an agent is defined by programmers using some simple mathematical formula or something.
When the fitness function is being defined in the system by programmers, instead of emerging from a rich and complex ecosystem, then the outcome depends exactly on what the programers choose. If they fail to see the consequences of their scoring algorithm, that's on them. There's nothing really magical going on, they simply failed to foresee the consequences of their choice.
(As someone who has worked with GAs and agent models, this outcome really doesn't surprise me. I would have said "oops, I need to weight the time less" and re-run it, and not thought twice.)