1. **Problem Statement:** We want to understand Principal Component Analysis (PCA) in high dimensions, focusing on dimension reduction, spectral properties of covariance matrices, and the spike model. 2. **PCA as Best d-Dimensional Affine Fit:** Given data points $x_1, \ldots, x_n \in \mathbb{R}^p$, PCA finds a $d$-dimensional affine subspace minimizing the sum of squared distances: $$\min_{\mu, V, \beta_k} \sum_{k=1}^n \|x_k - (\mu + V \beta_k)\|^2 \quad \text{s.t. } V^T V = I_d,$$ where $V = [v_1 \cdots v_d]$ is an orthonormal basis. 3. **Optimal Translation $\mu^*$:** By setting gradient to zero, the optimal $\mu^*$ is the sample mean: $$\mu^* = \mu_n = \frac{1}{n} \sum_{k=1}^n x_k.$$ 4. **Optimal Coefficients $\beta_k$:** For fixed $V$, the best $\beta_k$ is the projection: $$\beta_k = V^T (x_k - \mu_n).$$ 5. **Equivalent Optimization:** The problem reduces to maximizing variance captured: $$\max_{V^T V = I} \mathrm{Tr}(V^T \Sigma_n V),$$ where $\Sigma_n = \frac{1}{n-1} \sum_{k=1}^n (x_k - \mu_n)(x_k - \mu_n)^T$ is the sample covariance. 6. **Solution:** The columns of $V$ are the top $d$ eigenvectors of $\Sigma_n$ corresponding to the largest eigenvalues. 7. **PCA as Variance Maximization:** PCA also maximizes variance of projected data, confirming equivalence. 8. **Computational Complexity:** Computing $\Sigma_n$ costs $O(np^2)$, eigen-decomposition $O(p^3)$. Using SVD of centered data matrix $X - \mu_n 1^T$ reduces cost to $O(\min(n^2 p, p^2 n))$. 9. **Choosing $d$:** Use scree plot of eigenvalues $\lambda_1 \geq \lambda_2 \geq \cdots$ and look for an "elbow" to select $d$ capturing significant variance. 10. **High-Dimensional PCA and Marchenko-Pastur Law:** When $p,n \to \infty$ with $p/n = \gamma \leq 1$, eigenvalues of sample covariance $S_n = \frac{1}{n} X X^T$ follow Marchenko-Pastur distribution: $$dF_\gamma(\lambda) = \frac{\sqrt{(\gamma_+ - \lambda)(\lambda - \gamma_-)}}{2 \pi \gamma \lambda} 1_{[\gamma_-, \gamma_+]}(\lambda) d\lambda,$$ where $\gamma_\pm = (1 \pm \sqrt{\gamma})^2$. 11. **Spike Model:** Consider $\Sigma = I + \beta v v^T$ with $v$ unit vector and $\beta \geq 0$. Data $x \sim N(0, \Sigma)$. 12. **BBP Phase Transition:** The largest eigenvalue $\lambda_{max}(S_n)$ of sample covariance exhibits a phase transition: - If $\beta \leq \sqrt{\gamma}$, $\lambda_{max}(S_n) \to \gamma_+$ (edge of Marchenko-Pastur support). - If $\beta > \sqrt{\gamma}$, $\lambda_{max}(S_n) \to (1 + \beta)(1 + \frac{\gamma}{\beta}) > \gamma_+$. 13. **Eigenvector Alignment:** The leading eigenvector aligns with $v$ only if $\beta > \sqrt{\gamma}$. 14. **Summary:** PCA finds principal components as eigenvectors of covariance matrix. In high dimensions, random matrix theory (Marchenko-Pastur) describes eigenvalue distribution. Spike models reveal detectability thresholds (BBP transition) for low-rank signals in noise. **Final answer:** PCA components correspond to top eigenvectors of $\Sigma_n$. In high dimensions, eigenvalues follow Marchenko-Pastur law. Spike model shows eigenvalue separation if $\beta > \sqrt{\gamma}$.