CCA maximizes Correlation
CCA is a method for finding the directions of the maximum correlation between two paired views of the data. Different from Partial Least Squares, CCA is invariant to the scales of the features. It is more sensitive to the direction of the relationships across modalities.
CCA can be represented by the following optimization problem:
By forming the Lagrange Function and setting its derivative with respect to and to yields to the following equations:
Again, by substituting the second two conditions into the first two conditions we find that .
Substituting the second into the first condition and setting and . we can show that the CCA optimization problem is another pair of eigenvalue problems:
CCA can also be expressed as a singular value decomposition of .
PLS can be made equivalent to CCA by whitening the data matrices prior to the computation of the covariance matrix, as this ensures that and are identity matrices.
Despite being nonconvex, there are several ways of solving the CCA problem:
- Iterative optimization using an eigenvalue problem
- Generalized eigenvalue problem
- Block coordinate descent by alternating Least Squares Regression
- Gradient descent b
When , CCA degenerates because the within-view covariance matrix cannot be inverted
Regularized CCA
Provide provide solutions when and improve the robustness of the projections in case noisy observations with producing sparse solutions (facilitate interpretability)
Sparse CCA
Nonlinear extensions:
- Kernel CCA
- Deep CCA