Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is an unsupervised machine learning technique that analyzes and transforms high-dimensional data by exploring the covariance relationships between features. It projects data into a lower-dimensional space while preserving maximum variance of the original data.
Key Characteristics
- Operates by dual optimization an be written in its Lagrange form:
- Minimizes projection error
- Maximizes projection variance to retain original information
- Maintains data variance without normalizing principal components to unit length
- Functions as a simplified version of an AutoEncoder
Mathematical Foundation
- The components can be obtained by performing an eigen-decomposition of the covariance matrix with standards solvers.
- Based on Eigenvector calculation of the data Covariance Matrix, selecting vectors with largest Eigenvalue
- Solution involves selecting:
- M-largest eigenvalues for dimension retention
- (D-M)-smallest eigenvalues for dimension reduction
Applications
- Feature extraction and dimensionality reduction
- Data compression
- Data visualization
Computational Efficiency
PCA Notion