| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dwrodri 754 days ago

This is the sort of thing I expected to see when Chris Lattner moved to Google and started working on the Swift for Tensorflow project. I am so grateful that someone is making it happen!

I remember being taught how to write Prolog in University, and then being shown how close the relationship was between building something that parses a grammar and building something that generates valid examples of that grammar. When I saw compiler/language level support for differentiation, I the spark went off in my brain the same way: "If you can build a program which follows a set of rules, and the rules for that language can be differentiated, could you not code a simulation in that differentiable language and then identify the optimal policy using it's gradients?"

Best of luck on your work!

3 comments

justinnk 754 days ago

Thanks! You may find DeepProbLog by Manhaeve et al. interesting, which brings together logic programming, probabilistic programming and gradient descent/neural networks. Also, more generally, I believe in the field of program synthesis there is some research on deriving programs with gradient descent. However, as also pointed out in the comment below, gradient descent may not always be the best approach to such problems (e.g., https://arxiv.org/abs/1608.04428).

link

YeGoblynQueenne 753 days ago

>> "If you can build a program which follows a set of rules, and the rules for that language can be differentiated, could you not code a simulation in that differentiable language and then identify the optimal policy using it's gradients?"

What's a "policy" here? In optimal control (and reinforcement learning) a policy is a function from a set of states to a set of actions, each action a transition between states. In a program synthesis context I guess that translates to a function from a set of _program_ states to a set of operations?

What is an "optimal" policy then? One that transitions between an initial state and a goal state in the least number of operations?

With those assumptions in place, I don't think you want to do that with greadient descent: it will get stuck in local minima and fail in both optimality and generalisation.

Generalisation is easier to explain. Consider a program that has to traverse a graph. We can visualise it as solving a maze. Suppose we have two mazes, A and B, as below:

        A               B
  S □ □ ■ □ □ □   S □ □ ■ □ □ □ 
  ■ ■ □ ■ □ ■ □   ■ ■ □ ■ □ ■ □ 
  □ ■ □ ■ □ ■ □   □ ■ □ ■ □ ■ □ 
  □ ■ □ ■ ■ ■ □   □ ■ □ ■ ■ ■ □ 
  □ ■ □ ■ □ □ □   □ ■ □ ■ □ □ □ 
  □ ■ □ ■ □ ■ □   □ ■ □ ■ □ ■ □ 
  □ □ □ □ □ ■ E   E □ □ □ □ ■ □

Black squares are walls. Note that the two mazes are identical but the exit ("E") is in a different place. An optimal policy that solves maze A will fail on maze B and v.v. Meaning that for some classes of problem there is no policy that is optimal for the every instance in the class and finding an optimal solution requires computation. You can't just set some weights in a function and call it a day.

It's also easy to see what classes of problems are not amenable to this kind of solution: any decision problem that cannot be solved by a regular automaton (i.e. one that is no more than regular). Where there's branching structure that introduces ambiguity -think of two different parses for one string in a language- you need a context-free grammar or above.

That's a problem in Reinforcement Learning where "agents" (i.e. policies) can solve any instance of complex environment classes perfectly, but fail when tested in a different instance [1].

You'll get the same problem with program synthesis.

___________

[1] This paper:

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

https://arxiv.org/abs/2107.06277

makes the point with what felt like a very convoluted example about a robotic zoo keeper looking for the otter habitat in a new zoo etc. I think it's much more obvious what's going on when we study the problem in a grid like a maze: there are ambiguities and a solution cannot be left to a policy that acts like a regular automaton.

link

dwrodri 749 days ago

Thank for taking the time to explain such a worked out example. I was indeed picturing something a long the lines of "If you could write a program equivalent to a game where you solve a maze, could you produce a maze-solver program if the game were made in this runtime.

link

usgroup 754 days ago

Not really. The world of Bayesian modelling has much fancier tools: Hamiltonian MC. See MC Stan. There’s also been Gibbs samplers and other techniques which support discrete decisions for donkeys years.

You can write down just about anything as a BUGS model for example, but “identifying the model” —- finding the uniquely best parameters, even though it’s a globally optimisation —- is often very difficult.

Gradient descent is significantly more limiting than that. Worth understanding MC. The old school is a high bar to jump.

link

dwrodri 749 days ago

I wrote a Gibbs Sampler to try and fit a Latent Dirichlet Allocation model on arXiv abstracts many moons ago! I'd probably have to start from primitive stuff if I were to give it another go today.

I agree with everything you've said so far: getting to the point where you can use gradient descent to solve your problem often requires simplifying your model down to the point where you're not sure how well it represents reality.

My lived experience--and perhaps this is just showing my ignorance--I've had a much harder time getting anything Bayesian to scale up to larger datasets and every time I've worked with graphical models it's just such a PITA compared to what we're seeing now where we can slap a Transformer Layer with embeddings and get a decent baseline. The Bitter Lesson has empowered the lazy, proverbially speaking.

Tensorflow has a GPU-accelerated implementation of Black Box Variational Inference, and I've been meaning to revisit that project for some time. No clue about their MC sampler implementations. Then I stumbled across https://www.connectedpapers.com/ and Twitter locked up it's API, so admittedly both of those took a lot of the wind out of my sail.

Currently saving up my money so that I can buy Kevin Murphy's (I think he's on here as murphyk) two new books that released not too long ago https://probml.github.io/pml-book/. The draft PDFs are on the website, but unfortunately I'm one of those people who can't push themselves to actually read a text if it's not something I can hold in my hands.

link

cma 753 days ago

MC: Monte Carlo

link