Hacker News new | ask | show | jobs
by queuebert 1816 days ago
This is the danger of not understanding what you're doing at a deep level.

Clearly in the (flawed) objective there is a phase transition near the very beginning, where the wolves have to chose whether to minimize the time penalty or maximize the score. With enough "temperature" and time perhaps they could transition to the other minimum, but the time penalty minimum is much closer to the initial conditions, so you know ab initio that it will be a problem. You can reduce that by making the time penalty much smaller than the sheep score and adding it only much later. I feel bad that the students wasted so much time on a badly formulated problem.

Edit: Also none of these problems are black boxes if you understand optimization. Knowing what is going on inside a very deep neural network (such as an AGI might have) is quite different than understanding the incentives created by a particular objective function.