
Introduction to Principal Component Analysis
In this article, you will learn about in depth about the Principal Component Analysis (PCA) technique which is used for reducing the dimension of data while preserving the variance.
It is actually the transformation of high-dimension dataset into low-dimension dataset by projecting it onto the new axes. These new axes are called the principal components (PCs) and these are the linear combinations of original features.
PCA in Machine Learning
PCA is widely used in machine learning and data analysis where we try to retain most important information while performing the transformation. The questions may arise like why to use PCA? First of all, it reduces the computational cost since there will be fewer features, so fewer training. Secondly, it improves visualization by projecting a high-dimension data onto 2D/3D. Also, it reduces overfitting by eliminating noise.
How to Perform PCA? Step by Step Guide
Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms data into a new coordinate system where the greatest variance lies on the first axis (principal component), the second greatest on the next axis, and so on. Here’s a detailed step-by-step guide to performing PCA.
Step 1-Standardize the data:
This technique is sensitive to noise so we need to standardize the features such that Mean=0 and standard deviation=1. Here is the formula for standardization:
\[
X_{\text{std}} = \frac{X – \mu}{\sigma}
\]
Step 2-Compute Covariance Matrix:
In order to get to know how two features vary together, we need to calculate the covariance matrix. If a dataset has n features, it is an n × n symmetric matrix.
Formula
\[
\text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (X_i – \bar{X})(Y_i – \bar{Y})
\]
How to Find the Mean and Covariance in PCA?
Step 3: Compute Eigen Values and Eigen Vectors
This step basically involves linear algebra, where you have to apply the mathematical techniques for finding out the eigen values and eigen vectors.
Once you get a symmetric matrix from step 2, the next step is to apply the linear algebra techniques for finding eigen values and eigen vectors.
Basically, It involves building a characteristics equation whose roots represents eigen values. (These eigen values can be repeated or distinct). Then you substitute these eigen values one by one and find out the corresponding eigen vectors from the equation
(A-λI)V=0, where V is some non-zero eigen vector. This equation is solved by applying the method of Gauss Elimination.
Step 4: Sort Eigen Values and Select Principal Components
basically you have to pick the eigen vector corresponding to the largest eigen value. You can choose the top k eigen vectors for (k) dimensions.
Step 5: Project Data onto New Subspace
This is the last step where you have to transform the original data using the selected eigen vectors (k). Projection formula that we can use is:
Transformed Data=Xstd×Wk
where:
= matrix of top k
eigenvectors.
Step 6: Explained Variance (How Much Information is Kept?)
-
Explained Variance Ratio = Each eigenvalue / sum of all eigenvalues.
-
Helps decide how many PCs to keep (e.g., 95% variance).
Solved Example of Principal Component Analysis
Lets solve an example for gripping the idea explained above.
Given the following 2D dataset with 4 samples:
\[
X = \begin{bmatrix}
1 & 2 \\
3 & 3 \\
4 & 5 \\
5 & 7 \\
\end{bmatrix}
\]
Perform PCA to reduce the dimensionality to 1 component.
Step 1: Standardize the Data
First calculate mean and standard deviation:
\begin{align*}
\mu_1 &= \frac{1+3+4+5}{4} = 3.25 \\
\mu_2 &= \frac{2+3+5+7}{4} = 4.25 \\
\sigma_1 &= \sqrt{\frac{(1-3.25)^2 + \cdots + (5-3.25)^2}{4}} = 1.71 \\
\sigma_2 &= \sqrt{\frac{(2-4.25)^2 + \cdots + (7-4.25)^2}{4}} = 2.06
\end{align*}
Standardized matrix:
\[
X_{\text{std}} = \begin{bmatrix}
-1.32 & -1.09 \\
-0.15 & -0.61 \\
0.44 & 0.36 \\
0.88 & 1.33 \\
\end{bmatrix}
\]
Step 2: Compute Covariance Matrix
\[
\Sigma = \frac{1}{n}X_{\text{std}}^T X_{\text{std}} =
\begin{bmatrix}
0.83 & 0.93 \\
0.93 & 1.17 \\
\end{bmatrix}
\]
Step 3: Find Eigenvalues and Eigenvectors
Solve $|\Sigma – \lambda I| = 0$:
\[
\begin{vmatrix}
0.83-\lambda & 0.93 \\
0.93 & 1.17-\lambda \\
\end{vmatrix} = 0
\]
Characteristic equation:
\[
\lambda^2 – 2.00\lambda + 0.11 = 0
\]
Eigenvalues:
\[
\lambda_1 = 1.94, \quad \lambda_2 = 0.06
\]
Corresponding eigenvectors:
\[
\mathbf{v}_1 = \begin{bmatrix} 0.65 \\ 0.76 \end{bmatrix}, \quad
\mathbf{v}_2 = \begin{bmatrix} -0.76 \\ 0.65 \end{bmatrix}
\]
Step 4: Select Principal Component
Choose eigenvector with largest eigenvalue:
\[
\mathbf{w}_1 = \mathbf{v}_1 = \begin{bmatrix} 0.65 \\ 0.76 \end{bmatrix}
\]
Step 5: Project Data
Project standardized data onto PC1:
\[
X_{\text{pca}} = X_{\text{std}} \cdot \mathbf{w}_1 =
\begin{bmatrix}
-1.32 \times 0.65 + (-1.09) \times 0.76 \\
-0.15 \times 0.65 + (-0.61) \times 0.76 \\
0.44 \times 0.65 + 0.36 \times 0.76 \\
0.88 \times 0.65 + 1.33 \times 0.76 \\
\end{bmatrix}
= \begin{bmatrix}
-1.66 \\
-0.56 \\
0.53 \\
1.68 \\
\end{bmatrix}
\]