A question about an opposite problem - is there a way to do this (and the whole Deep Learning) on discrete domains? So far all I've seen assumes continuous functions to be able to perform back-propagation; haven't seen anyone using discrete calculus with similar rules to continuous one (see Graham/Knuth/Patashnik). That could open many more interesting applications...
Dealing with discrete variables is trivial, you can just map them into a continuous space and proceed as normal.
Trying to learn discrete rules is harder because the learning procedure uses gradients to adjust parameters, and the gradients will be zero in a lot more places with discrete "rules".
Gradient Boosted Trees are probably the main thing that comes to mind, but they're not really deep learning.
People have tried to learn hard vs soft attention mechanisms, and while hard attention is faster, it results in worse accuracy and is harder to train.
The inference I draw is that most of the things we want to learn are not described well by discrete rules.
Can you use the typical approaches to classification? You can define a continuous error function and perform back-propagation using that. If you look at something like Kaggle then deep learning approaches tend to dominate classification challenges just as much as they do regression ones.
That's the usual approach which has its limits. I am specifically curious about discrete domains. Look at it as at mixed integer programming - yes, you can estimate solution using linear programming, but that estimate is usually useless. Having a specific method for mixed integer programming usually yields far better solutions.
Noob here but aren't Bayesian Network DAG just a specialize Neural Network? If so you can use Dirichlet Distribution for Bayesian Network and that's discrete... Unless I'm misunderstanding.
Thank you for the comment and the link. I agree with most of the points listed there. And GAM is a great tool when there is non-linear and non-monotonic relation between the response and independent variables. GAM has good interpretability but it is still somehow difficult to understand in some business environment. For example, in credit scoring, logistic regression with binning is still widely applied.
In my experience, most the time people use binning, it's straightforward to demonstrate that their binning+model is equivalent to restricted forms of more general models (e.g. common general additive / structural equation model). Sometimes binning is useful, because it makes them much easier to estimate.
However, people's rationales for why they should bin is often that it makes the model better / more interpretable, without actually testing the more restricted binned model against the more general one. There's certainly something to be said for knowing your audience when choosing a model, though :).
Logistic regression is in fact obtained by discretizing continuous variables with logistically distributed errors. This is the "threshold model". If you assume normal instead of logistic you get the probit.