Skip to content

Latest commit

 

History

History

EnergyBasedModel

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Energy-Based Model

Feed-forward networks use a finite number of steps to produce a single output.

However, what if..

- The problem requires a complex computation to produce it's output?

- There are multiple possible outputs for a single input?

Energy function f(x, y) is a scalar-valued function, which takes low values when y is compatible with x, and higher values when y is less compatible with x.

Inference with Energy function finds values of y that make the f(x, y) small. You should note that the energy is only used for inference, not for learning.

Energy-based model

In the example above, blue dots are data points. As you could see, the data are aligned at lower locations (spaces that have lower energy).

Implicit Function

Unlike feed-forward model, EBM is an implicit function

  • A feed-forward model is an explicit function that calculates y from x.

  • An EBM (Energy-Based Model) is an implicit function that captures the dependency between y and x.

  • Multiple Y can be compatible with a single X.

Multiple Y can be compatible with a single X

  • Energy function that captures the dependencies between x and y

    1. Low energy near the data points

    2. High energy everywhere else

    3. If y is continuous, energy function f should be smoothe and differentiable, so we can use gradient-based inference algorithms

Energy function that captures the dependencies between x and y

When inference is hard

When inference is hard

When inference involves latent variables

When inference involves latent variables

Latent Variable - EBM inference

latent variables

  • Allowing multiple predictions through a latent variable

  • As latent variable z varies over a set, y varies over the manifold of possible predictions

As latent variable z varies over a set, y varies over the manifold of possible predictions

  • Useful then there are multiple correct (or plausible) outputs.

Inference with latent variables

Energy-Based Models vs Probabilistic Models

  • Probabilistic model is a special case of energy-based model (Energies are like unnormalised negative log probabilities)

  • Why use EBM instead of probabilistic models?

    1. EBM gives more flexibility in the choice of the sciring function

    2. More flexibility in the choice of objective function for learning

  • From energy to probability: Gibbs Boltzmann distribution (Beta is a positive constant)

Gibbs Boltzmann distribution

Marginalizing over the latent variable

Seven Strategies to Shape the Energy Function

한글 설명

1. Build the machine so that the volume of low energy stuff is constant

  • PCA

  • K-mean

  • GMM

  • square ICA

PCA, K-mean

2. Push down of the energy of data points, push up everywhere else

  • Maximum likelihood (needs tractable partition function)

3. Push down of the energy of data points, push up on chosen locations

  • Contrastive divergence

  • Ratio matching

  • Noise contrastive estimation

  • Minimum probability flow

4. Minimize the gradient and maximize the curvature around data points

  • score matching

5. Train a dynamical system so that the dynamics go to the manifold

  • denoising auto-encoder

6. Use a regularizer that limits the volume of space that has low energy

  • Sparse coding

  • Sparse auto-encoder

  • Predictive Sparse Decomposition

Sparse coding energy surface

7. If E(Y)=||Y−G(Y)||^2, make G(Y) as "constant" as possible

  • Contracting auto-encoder, saturating auto-encoder

Personal Thoughts

Personally, I think EBM can just be applied to any problem, because the EBM simply means the loss function. The concept and the terms of the EBM does not tell you anything.

References

[1] Yann LeCun Lecture: Energy based models and self-supervised learning