-
-
Notifications
You must be signed in to change notification settings - Fork 3
Linear Discriminant Analysis
Dimensionality reduction can be aquired through several methods, for instance: linear and non-linear methods. Here, the linear method LDA will be presented.
It can be efficient to remove redundant and dependent dimensions by transforming from a higher dimensional space to a lower space without getting the curse of dimensionality[2]. The curse of dimensionality in essence refers to the complications that arise due to having too many dimensions such that the amount of data that needs to be generalized becomes becomes a harder task to accomplish.[3]
Theorem 1: LDA is trying construct a lower dimensional space which maximizes the ratio between between-class variance and within-class variance, which maximizes class separability.[2]
between-class - distance between means of different classes.
within-class - distance between the mean and the observations of each class
In one equation,
(Optional) In other words the matrix required to transform the data matrix into our desired transformation is directly proportional, or dependent, to the inverse of the within-class matrix times the difference between the class means. [10]
Mathematically ->
(I don't understand it. If it helps with your understanding congrats)
Now the algorithm for computing LDA will be presented. Formulas will be omitted. They can be found in [2].
- From a set of
$N$ observations$[x_i]^{N}_{i=1}$ , each of which is represented as a row of length$M$ . Additionally the observations are stored in the data matrix$X$ ($N \times M$ ) - Calculate the mean for each class
$\mu_i$ ($1 \times M$ ) - Calculate the mean for all the data
$\mu$ ($1 \times M$ ) - Calculate the between-class matrix
$S_B$ ($M \times M$ ) - Calculate the within-class matrix
$S_W$ ($M \times M$ ) - Calculate the transforming matrix
$W$ - Calculate the eigenvalues
$\lambda$ and eigenvectors$V$ for$W$ - Sort the eigenvectors in descending order according to their corresponding eigenvalues. The first
$k$ eigenvectors will be used as a lower dimensional space ($V_k$ ) - Project the original data matrix
$X$ onto$V_k$
The algorithm that is shown works for the class-independent technique. In the class-independent (CI) only one global transformation matrix is calculated, and all of the observations are projected on the eigenvectors. For the class-dependent (CD) technique the within-class matrix, tranformation matrix, eigenvalues and eigenvectors need to be computed. Then each observation should be projected onto the class' lower dimensional space.
Key takeway: CD is computationally more expensive than CI, and it may affect the singulairty of the within-class matrices.[2] Other sources [8][9] suggest that CD is more precise because the transformation matrix is calculated in terms of the discrimination information that is contained in the respective class, instead of one singular transformation matrix constructed for all the classes.
Potential Problems: (The problems can be fixed as there are solutions contained in [2], but also other sources)
LDA does not work on non-linearly separable classes - that is, if the means of the classes are approximately equal, then the between-class variance and covariance matrix will be zero, which means that no projection onto a lower dimension will be accomplished.
Another problem which can be encountered while applying LDA is the Singularity problem. It occurs when the within-class variance is singular - that is, the determinant of
For optimal performance, LDA assumes that the data has a normal distribution and that the classes have equal covariances. However, reserach shows that it could also do some decent work without the aforementioned assumptions.[5]
[1] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.735.8417&rep=rep1&type=pdf
[2] http://usir.salford.ac.uk/id/eprint/52074/1/AI_Com_LDA_Tarek.pdf
[3] https://www.kdnuggets.com/2017/04/must-know-curse-dimensionality.html
[4] https://link.springer.com/content/pdf/10.1007/978-0-387-78189-1.pdf
[5] https://en.wikipedia.org/wiki/Linear_discriminant_analysis
[6] https://papers.nips.cc/paper/2004/file/86ecfcbc1e9f1ae5ee2d71910877da36-Paper.pdf
[7] https://sthalles.github.io/fisher-linear-discriminant/
[8] https://www.eurasip.org/Proceedings/Eusipco/Eusipco2014/HTML/papers/1569926695.pdf
[9] https://journal.xidian.edu.cn/xdxb/EN/abstract/abstract11596.shtml
[10] https://sthalles.github.io/fisher-linear-discriminant/
The between-class variance(
We denote $S{B}{i}$ as the between-class variance for the
where
Computing the mean for the
The rest of the formula for
We assume that we have a data matrix
We assume that we have
The within-class variance(
(. . .)
Algorithm for class-dependent LDA:
-
From a set of
$N$ observations$[x_i]^{N}_{i=1}$ , each of which is represented as a row of length$M$ . Additionally the observations are stored in the data matrix$X$ ($N \times M$ ) -
Calculate the mean for each class
$\mu_i$ ($1 \times M$ ) -
Calculate the mean for all the data
$\mu$ ($1 \times M$ ) -
Calculate the between-class matrix
$S_B$ ($M \times M$ ) -
for all Class i = 1,2... c do
-
Calculate the within-class matrix for each class
$S_{{W}_{i}}$ ($M \times M$ ) -
Calculate the transforming matrix for each class
$W_i$ -
Calculate the eigenvalues
$\lambda^i$ and eigenvectors$V^i$ for each$W_i$ transformation matrix for the respective class -
Sort the eigenvectors in descending order according to their corresponding eigenvalues. The first
$k$ eigenvectors will be used as a lower dimensional space ($V_{k}^{i}$ ) -
Project the original data matrix
$X$ onto$V_{k}^{i}$
-
-
end for