Feed-forward networks use a finite number of steps to produce a single output.
However, what if..
- The problem requires a complex computation to produce it's output?
- There are multiple possible outputs for a single input?
Energy function f(x, y) is a scalar-valued function, which takes low values when y is compatible with x, and higher values when y is less compatible with x.
Inference with Energy function finds values of y that make the f(x, y) small. You should note that the energy is only used for inference, not for learning.
In the example above, blue dots are data points. As you could see, the data are aligned at lower locations (spaces that have lower energy).
-
A feed-forward model is an explicit function that calculates y from x.
-
An EBM (Energy-Based Model) is an implicit function that captures the dependency between y and x.
-
Multiple Y can be compatible with a single X.
-
Energy function that captures the dependencies between x and y
-
Low energy near the data points
-
High energy everywhere else
-
If y is continuous, energy function f should be smoothe and differentiable, so we can use gradient-based inference algorithms
-
-
Allowing multiple predictions through a latent variable
-
As latent variable z varies over a set, y varies over the manifold of possible predictions
- Useful then there are multiple correct (or plausible) outputs.
-
Probabilistic model is a special case of energy-based model (Energies are like unnormalised negative log probabilities)
-
Why use EBM instead of probabilistic models?
-
EBM gives more flexibility in the choice of the sciring function
-
More flexibility in the choice of objective function for learning
-
-
From energy to probability: Gibbs Boltzmann distribution (Beta is a positive constant)
-
PCA
-
K-mean
-
GMM
-
square ICA
- Maximum likelihood (needs tractable partition function)
-
Contrastive divergence
-
Ratio matching
-
Noise contrastive estimation
-
Minimum probability flow
- score matching
- denoising auto-encoder
-
Sparse coding
-
Sparse auto-encoder
-
Predictive Sparse Decomposition
- Contracting auto-encoder, saturating auto-encoder
Personally, I think EBM can just be applied to any problem, because the EBM simply means the loss function. The concept and the terms of the EBM does not tell you anything.
[1] Yann LeCun Lecture: Energy based models and self-supervised learning