Hacker News new | ask | show | jobs
by Eridrus 3165 days ago
Dealing with discrete variables is trivial, you can just map them into a continuous space and proceed as normal.

Trying to learn discrete rules is harder because the learning procedure uses gradients to adjust parameters, and the gradients will be zero in a lot more places with discrete "rules".

Gradient Boosted Trees are probably the main thing that comes to mind, but they're not really deep learning.

People have tried to learn hard vs soft attention mechanisms, and while hard attention is faster, it results in worse accuracy and is harder to train.

The inference I draw is that most of the things we want to learn are not described well by discrete rules.