| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cgreerrun 958 days ago

You can use the straight through operator!

During the forward pass you sample a discrete outcome given your NN weights to get an error for backprop. During the backward pass you directly propogate through the weights.

This GradTree paper[1] does a good job covering how to do discrete gradient-based optimization (i.e. NNs w/ discrete representations).

Another option is to use a GFlowNet[2]. Then you have a NN policy that takes discrete actions like you're playing an RL game. You're not back-propogating through something that isn't continuous, but you're utilizing a NN to make informed decisions about a problem with a discrete representation.

[1] GradTree (https://arxiv.org/pdf/2305.03515.pdf) [2] GFlowNet (https://arxiv.org/abs/2111.09266)

2 comments

smaddox 958 days ago

It might also be possible to solve a corresponding convex optimization problem, similar to https://arxiv.org/pdf/2303.03382.pdf

link

ubj 958 days ago

If an appropriate dual problem is found this could be possible.

I'm baffled why Mert Pilanci's work in this area hasn't received more attention. His proofs of a zero duality gap for neural networks are impressive.

link

smaddox 958 days ago

Well, the approach currently doesn't apply to architectures being used in practice, and it's not clear, at least to me, if it can scale to large datasets.

link

ubj 957 days ago

Fair points. But given the potential impact this would have in practice, I'm still surprised there isn't more interest in exploring whether it's possible to scale the approach. Transforming neural network training into convex formulations would drastically simplify many current difficulties.

link

lucidrains 958 days ago

gradtree is cool! thanks for the share!

link

cgreerrun 957 days ago

You're welcome! If you're interested, there is a follow up paper about a method that extends GradTree to a boosted tree as well called GRANDE[1].

[1]https://arxiv.org/abs/2309.17130

link