Hacker News new | ask | show | jobs
by cgreerrun 958 days ago
You can use the straight through operator!

During the forward pass you sample a discrete outcome given your NN weights to get an error for backprop. During the backward pass you directly propogate through the weights.

This GradTree paper[1] does a good job covering how to do discrete gradient-based optimization (i.e. NNs w/ discrete representations).

Another option is to use a GFlowNet[2]. Then you have a NN policy that takes discrete actions like you're playing an RL game. You're not back-propogating through something that isn't continuous, but you're utilizing a NN to make informed decisions about a problem with a discrete representation.

[1] GradTree (https://arxiv.org/pdf/2305.03515.pdf) [2] GFlowNet (https://arxiv.org/abs/2111.09266)

2 comments

It might also be possible to solve a corresponding convex optimization problem, similar to https://arxiv.org/pdf/2303.03382.pdf
If an appropriate dual problem is found this could be possible.

I'm baffled why Mert Pilanci's work in this area hasn't received more attention. His proofs of a zero duality gap for neural networks are impressive.

Well, the approach currently doesn't apply to architectures being used in practice, and it's not clear, at least to me, if it can scale to large datasets.
Fair points. But given the potential impact this would have in practice, I'm still surprised there isn't more interest in exploring whether it's possible to scale the approach. Transforming neural network training into convex formulations would drastically simplify many current difficulties.
gradtree is cool! thanks for the share!
You're welcome! If you're interested, there is a follow up paper about a method that extends GradTree to a boosted tree as well called GRANDE[1].

[1]https://arxiv.org/abs/2309.17130