Fun times with energy-based models

Y	Hacker News new \| ask \| show \| jobs

	Fun times with energy-based models (mpmisko.github.io)
	82 points by mpmisko 715 days ago

5 comments

blt 714 days ago

If the author is reading: In the proof, at the end of Step 6, it's confusing that the "uv" term of the integration by parts is suddenly given a range from -∞ to ∞, as if we had previously assumed x ∈ ℝ. But elsewhere in the article, including the examples, we have higher-dimensional x's. I suggest to either 1) include the full multidimensional version from the paper, or 2) explicitly mention that this is the simple 1d case, the same result holds in ℝ^n, and refer to the paper.

link

cshimmin 714 days ago

In the multidimensional case of integration by parts, the limit of integration is understood to be the (d-1 dimensional) boundary _at infinity _, so writing +/-inf is a reasonable shorthand. In almost all cases, it’s used when the term uv can be assumed to be zero (or at least constant) at this boundary.

link

adamnemecek 714 days ago

We are working on a startup that is revisiting the math that underlies EBMs. If you want to work with us or invest, check out these links

http://traceoid.ai

https://x.com/adamnemecek1/status/1822727041399328839

link

uoaei 714 days ago

Is there a chance that you would ever consider passionate non-PhD candidates with corporate, startup, and FAANG experience?

link

adamnemecek 713 days ago

Definitely. DM me on discord (see http://traceoid.ai for invite).

link

esafak 714 days ago

I think the rationale for using tricks like score matching and contrastive divergence deserves a mention: the partition function is computationally expensive.

Since we're on the subject, what are EBMs good for today?

link

mpmisko 714 days ago

EBMs show up all over the place, apparently even your classifier is an EBM :) (https://arxiv.org/abs/1912.03263).

link

uoaei 714 days ago

You can take many equivalent perspectives on learning systems, but mostly it reduces to "messing with denominators in Bayes' rule". This is no different.

EBMs today aren't used because first you have to fit the joint model, then you have to fix some inputs, then fit the other inputs in a second optimization step. That's just too much compute for today's workloads compared to feedforward NNs.

link

programjames 714 days ago

They're good for reinforcement learning. E.g. Cicero uses piKL which samples according to

p ∝ anchor_policy * exp(utility / temperature)

The utility is exactly the same as "energy". The article ignores entropy, but you can add in entropy regularization e.g. in soft actor-critic.

link

jiggawatts 714 days ago

This paper lists the benefits in the introduction: https://proceedings.neurips.cc/paper_files/paper/2019/file/3...

- Simplicity and Stability: An EBM is the only object that needs to be trained and designed. Separate networks are not tuned to ensure balance.

- Sharing of Statistical Strength: Since the EBM is the only trained object, it requires fewer model parameters than approaches that use multiple networks.

- Adaptive Computation Time: Implicit sample generation is an iterative stochastic optimization process, which allows for a trade-off between generation quality and computation time.

- VAEs and flow-based models are bound by the manifold structure of the prior distribution and consequently have issues modelling discontinuous data manifolds, often assigning probability mass to areas unwarranted by the data. EBMs avoid this issue by directly modelling particular regions as high or lower energy.

- Compositionality: If we think of energy functions as costs for a certain goals or constraints, summation of two or more energies corresponds to satisfying all their goals or constraints.

link

programjames 714 days ago

As far as I can tell, flow-based models are bound by the exact same requirements as energy based models (flow = diffusion/normalizing flow/flow-matching models). But they're absolutely right about VAEs. Those are a memetic virus that need to die off in favor of more theoretically grounded encoders.

link

slashdave 714 days ago

Also: you are free to model p(x) without worrying about normalization, something that would be required to maximize likelihood.

link

uoaei 714 days ago

I wish I had more opportunity to use EBMs. Joint distributions seem to be more relevant to our epistemology (vis a vis data and what we can say about it) than conditional ones. But the optimization steps for fitting parameters is kind of a dealbreaker because of how many steps it can take, and also because most ML frameworks are aggressively feedforward.

link

stubbi 714 days ago

Interesting. Since I studied them during my Masters I feel in the longer term EBMs will be the way forward for AI

link