It is used for objects classification based on the set of features and for dimensionality reduction. Groups are pre known so its supervised technique unlike PCA which is unsupervised.
It is different from any kind of clustering also as in clustering, based on type of clustering rules are already given to classify the data. Ex. K means takes distance into consideration while LOF takes density into consideration. LDA would give the classification rules much like logistic regression. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid overfitting (“curse of dimensionality”) and also reduce computational costs.
Groups should be linearly separable.
Standardization of data is optional.
Groups should be normally distributed.
but in addition to finding the component axes that maximize the variance of our data (PCA), we are additionally interested in the axes that maximize the separation between multiple classes (LDA).
So, in a nutshell, often the goal of an LDA is to project a feature space (a dataset n-dimensional samples) onto a smaller subspace kk (where k≤n−1k≤n−1) while maintaining the class-discriminatory information.
Comparision of LDA and PCA-
It seems like both have separated data very well. With class separability should be lesser in LDA as it has to separate the class data well .
Taking IRIS datasets and for 2 class, LDA performs better than PCA in terms dimensionality reduction.
Detailed reading is present at-