|
|
|
|
|
by fennecfoxy
1116 days ago
|
|
Definitely just bad model/test conditions/scoring design. Of course the military is using home-grown fisher price models. The reward function should primarily be based on following the continued instructions of the handler, not taking the first instruction and then following it to the letter. What's funny though, is that the model proved that it was adept at the task they gave it. Trying to kill the operator, then when adjusted pivoting to destroying the comms tower the operator used. That's still clever. As per usual the problem isn't the tool, it's the tool using the tool. Set proper goals and train the model properly and it would work perfectly. I think weapons should always require a human in the loop, but the problem is that there'll be an arms-race where some countries (you know who) will ignore these principles and build fully autonomous no-human weapons. Then, when our systems can't react fast enough to defend ourselves because they need a human in the loop, what will we do? Throw out our principles and engage in fully-autonomous weaponry as well? It's the nuclear weapons problem all over again... |
|
It still gets tricky though if you want to include failsafes in the model to prevent the drone from following bad orders. Should the drone be able to disobey a bad or misinformed order, say if the operator tells it to hit a target that the drone identified as unarmed civilians, or generally not a threat? What if it recognized an alternative approach that would complete the overall mission with less risk of damage?