There
are usually exhibit relationships (e.g. linear) among the variables of a real
data set. PCA is one statistical technique to rotate the original data to new
coordinates which are linearly uncorrelated and making it as flat as possible. Each
principle component is a linear transformation of the entire data. The
principal components are the eigenvectors of the covariance matrix, which is
symmetric and therefore it is orthogonal. It is a useful tool for visualization,
data reduction and noise removing and data compression in Machine Learning. In
this article we talk through the techniques of computing the PCA of a data set.
In MATLAB,
we apply princomp” function as a part of the Statistics Toolbox for calculating
PCA and it can be used in the following way:
[n m] = size(OriginalData);
XMean = mean(OriginalData);
% computes the mean value of the OriginalDat.
XStd = std(OriginalData);
%calculate the standard deviation of OriginalDat.
Data = (OriginalData - repmat(XMean,[n 1]))./ repmat(XStd,[n 1]);
% standardizing the OriginalData via subtracting the mean from each observation and dividing by the standard deviation to center and scale the OriginalData.
[Coeff Score Latent] = princomp(Data)
where
“princomp” function returns “Coeff” as the
principal component coefficients , “Score” as the principal component scores
which is the indication of the “Data” in the principal component space such a
way that its rows are representing to observations and its columns to
components. “Latent” is the eigenvalues vector
of the covariance matrix of “Data”. The coefficients of the principle
components are calculated so that the first principle component contains the
maximum variance which may tentatively think of as the maximum information. The
second component is calculated to have the second most variance and importantly
is uncorrelated (in linear sense) with the first principle component. Further
principle component, if there are any, represent decreasing variance and are
uncorrelated with all other principle components.
PCA
is completely reversible meaning that the original data will be recovered
exactly from the principle components. To compute the reconstruction error we
perform the following code:
C = Coeff(:,1:dim))
ReconstructedData = ((Data*C*C').*
repmat(XStd,[n 1])+ repmat(XMean,[n 1]);
Error =
sqrt(sum(sum((Data - ReconstructedData).^2.715)))+(1/Dimensionality)^1;
where
dim is the number of principle components we want to considered.
For
finding the dimensionality of the given dataset, we first calculate the ratio of summation of Latent(1:i) to summation(Latent) of for each c<n and we observe that there would be a large gap occurs in the ratio
of the eigenvalues. Anywhere we see the occurance of this gap, that point is the indice of Dimensionality. Following is the pseudo code for calculating the
dimensionality of a given data set:
for i=1 to size(Latent)
sum(Latent(1 to i))/sum(Latent)
if(
Latent(i)/Latent(i+1) > Threshold) Then Ndimension = i
break
end
We can also use cross validation trick to pick the best dimension resulting the minimum reconstruction error while reconstructing the original data from principle components (via running algorithm on validation set).
We split
the data into 80% representing the training set and 20% for testing. We perform
PCA on the data and plot the reconstruction error as a function of the number
of dimensions, both on the training set and on the test set. Following is the list
of results extracted from our observations:
- One important given result about the principle component is that they are “completely uncorrelated”, which we can test by calculating their correlation matrix via “corrcoef” function in MATLAB.
- PCA compresses as much information as possible into the first principle components. In some cases, the number of principle components needed to be stored.
- The majority of variance is too small compared to number of features.
- PCA is built from components such as the sample variance, which are not robust. It means that PCA may be thrown off by outliers.
- Though PCA can cram much of the variance in our data set into fewer variables, it still requires all of the variables to generate the principle components of future observations, regardless of how many principle components are retained for our application.
No comments:
Post a Comment