HELP! Principal Component Analysis - Dimensionality Reduction

When we talk about PCA we say that we use it to reduce the dimensionality of the data. For instance, we have 2-d data, and using PCA we reduced the dimensionality to 1-d.
The first component will be in such a way that it captures the maximum variance. What does it mean that the 1st component has max. variance?

Also, if we take 3-d data and reduce its dimensionality to 2-d then the 1st component will be built with max variance along the x-axis or y-axis?

In a nutshell, PCA works by first centering the data at the origin (subtracting the mean from each data point), and then rotating it to be in line with the axes (diagonalizing the covariance matrix into a “variance” matrix). The components are then sorted so that the diagonal of the variance matrix is in descending order, which translates to the first component having the largest variance, the second having the next largest variance, etc. Later, you squish your original data by zero-ing out less important components (projecting onto principal components), and then undoing the aforementioned transformations.

To answer your questions:

  1. The first component having the max variance means that its corresponding entry in the variance matrix is the largest one.
  2. I suppose it depends on what you call your axes.

Source: Probability and Statistics for Computer Science by David Forsyth.

1 Like

Let’s imagine that we have the following dataset.
You want to choose principal components that maximize variance. You do the following:

  • Draw a line through the dataset
  • Project all data points on the line you have drawn
  • Choose the one with the largest variance

In the picture shown above, it is the one with the largest width. Since the data points are spread apart, it should give you the largest variance if you care to calculate it.

This is when you want to pick the first principal component. To pick the second principal component:

  • Draw a line, but this time it should be orthogonal to the first principal component
  • Project the points to this line
  • Choose the one with the largest variance

To pick PC2, it means the dimension of your data should be greater than 2. The picture above has two dimensions. So you can reduce to 1 with PCA.

In order to pick the PC2 we need to draw a line first then check out for larger variance?
Also the variance for principal_components is considered as the spread of points no? so for C2 in above image the spread of points (variance) is seen as in the North-West to South-East direction, am I right?

Also, the PC2 should be orthogonal to PC1. So, the PC3 should be orthogonal to PC1 and PC2 both or orthogonal only to PC2?