1. **Problem Statement:** We have sensor data from 15 wearable sensors recorded every second for a day, resulting in millions of records. We want to apply Principal Component Analysis (PCA) to reduce dimensionality while retaining important information. 2. **Data Matrix Construction:** Let $X$ be the data matrix where each row represents a time point (each second) and each column represents a sensor. So, $X$ is an $n \times 15$ matrix where $n$ is the number of seconds recorded in a day. This form is appropriate because PCA analyzes covariance among features (sensors), so columns represent variables and rows represent observations. 3. **Centering the Data:** To center the data, subtract the mean of each column from the respective column entries: $$X_{centered} = X - \mathbf{1}_n \mu^T$$ where $\mu$ is the $15 \times 1$ vector of column means and $\mathbf{1}_n$ is an $n \times 1$ vector of ones. Centering ensures each sensor's data has zero mean, which is necessary because covariance measures joint variability around the mean. 4. **Covariance Matrix Computation:** The covariance matrix $C$ is computed as: $$C = \frac{1}{n-1} X_{centered}^T X_{centered}$$ Centering modifies the structure by removing mean offsets, so $C$ captures true variance and covariance among sensors. 5. **Eigen Decomposition and Principal Components:** Solve the eigenvalue problem: $$C v_i = \lambda_i v_i$$ where $\lambda_i$ are eigenvalues and $v_i$ eigenvectors. Each eigenvector $v_i$ is a principal component direction, and the corresponding eigenvalue $\lambda_i$ measures variance explained along that component. Large eigenvalues correspond to directions with high variance in sensor behavior. 6. **Variance Explained by First Five Components:** Total variance is sum of all eigenvalues: $$\text{Total Variance} = \sum_{i=1}^{15} \lambda_i$$ Variance explained by first five components: $$\text{Explained Variance} = \sum_{i=1}^5 \lambda_i$$ Keeping only these components reduces storage (fewer features), improves computation (smaller matrices), and reduces noise (discarding low-variance components). 7. **Projection and Reconstruction:** Project original data to reduced space: $$Z = X_{centered} W$$ where $W$ is the $15 \times 5$ matrix of first five eigenvectors. Reconstruct approximation: $$\hat{X} = Z W^T + \mathbf{1}_n \mu^T$$ Reconstruction error increases as fewer components are retained because less variance is captured. 8. **Covariance Eigen Decomposition vs SVD:** SVD decomposes $X_{centered}$ as: $$X_{centered} = U \Sigma V^T$$ SVD is more computationally stable for large-scale data because it avoids explicit covariance matrix computation and handles rank-deficient or ill-conditioned data better. A numerical issue in eigen decomposition is sensitivity to noise causing unstable eigenvalues. 9. **Limitation of PCA and Alternative:** PCA assumes linear relationships and maximizes variance, which may miss complex nonlinear activity patterns. An alternative is Kernel PCA or nonlinear dimensionality reduction methods that capture complex structures, improving recognition of complex activities.