SVD Factorization

SVD Factorization

SVD 分解的重要性我就不多说了,本文介绍下 SVD 分解

1. \(A \mathbf {v}_i = \sigma_i \mathbf{u}_i\)

如果你还记得这副图的话,

这幅图包含一个深刻的道理:

矩阵乘法 \(A\mathbf {x}\) 实际上是将 \(A\) 的行空间映射到 \(A\) 的列空间

同时我们还知道:

矩阵的行秩等于列秩,或者说,矩阵的行空间的维度等于它的列空间的维度

那么,自然的,我们就像探索行空间的基底和列空间的基底的关系

假设 \(A\)\(m \times n\) 阶实矩阵, \(r = \mathrm {A} \le \min \{m, n\}\)。 假设 \(\{ \mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_r\}\)\(A\) 的行空间 \(R(A)\) (或 \(C(A^T)\)) 的一组基底,\(\{ \mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_r\}\)\(A\) 的列空间 \(C(A)\) 的一组基底,那么通过上面的分析,我们想得到标题的等式: \[
A \mathbf {v}_i = \sigma_i \mathbf {u}_i
\]

根据 特殊矩阵 中 Gramian 矩阵的性质:

我们知道矩阵 \(A^TA\) 的特征值均不小于0 (因为它是半正定矩阵),记为 \(\sigma_1^2, \ldots, \sigma_n^2\)\(\mathbf {v}_1, \ldots, \mathbf {v}_n\) 为对应的单位正交特征向量,即: \[
A^T A \mathbf {v}_i = \sigma_i^2 \mathbf {v}_i, \quad i=1, \ldots, n
\]
改写为矩阵乘法,如下: \[
A^T A \begin {bmatrix} \mathbf {v}_1 &\cdots & \mathbf {v}_n \end {bmatrix} = \begin {bmatrix} \mathbf {v}_1 &\cdots & \mathbf {v}_n \end {bmatrix} \begin {bmatrix} \sigma_1^2 \\ &\ddots \\ & & \sigma^2_n \end {bmatrix}
\]
\(V = \begin {bmatrix} \mathbf {v}_1 &\cdots & \mathbf {v}_n \end {bmatrix}\) 。不难确认 \(V^T V = I\), 故 \(V^T = V^{-1}\) ,我们称 \(V\) 为实正交矩阵 (orthogonal matrix)。上式右乘 \(V^T\) ,可得: \[
A^T A = V \begin {bmatrix} \sigma_1^2 \\ &\ddots \\ & & \sigma^2_n \end {bmatrix} V^T
\]
由于 \(\mathrm {rank} (A^TA) = \mathrm {rank} (A)\) ,同时 \(\mathrm {rank} (A^TA) = \mathrm {rank}(\mathrm {diag} (\sigma_1^2, \ldots, \sigma_n^2))\)

如果我们假设 \(\sigma_1 \ge \cdots \sigma_r \gt 0\)\(\sigma_{r+1} = \cdots = \sigma_{n} = 0\) ,则 \(\mathrm {rank} (A^TA) = r\), 同时 \(A^TA \mathbf {v}_j = \sigma_j^2 \mathbf {v}_j = \mathbf {0}\) ( \(j = r+1, \ldots, n\))

说明 \(\{ \mathbf{v}_{r+1}, \ldots, \mathbf{v}_n\}\) 属于 \(A^T A\) 的零空间,而 \(\{ \mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n\}\) 是线性独立集,且 \(\dim N(A^TA) = n-r\) , 故 \(\{ \mathbf{v}_{r+1}, \ldots, \mathbf{v}_n\}\)\(N(A^TA)\) 的一组正交规范基底,我们之前证明过 \(A^TA\)\(A\) 有相同的零空间

于是根据子空间的正交补余,\(C(A^T) = N(A)^{\perp}\), \(\{ \mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_r\}\)\(C(A^T)\) 的一组正交规范基底。

上面特征方程式左乘 \(A\), 即得: \[
AA^T A \mathbf {v}_i = \sigma_i^2 A \mathbf {v}_i, i = 1, \ldots, n
\]
这说明 \(m \times m\) 阶矩阵 \(AA^T\) 有非零特征值 \(\sigma_1^2, \ldots, \sigma_r^2\) ,分别对应特征向量 \(A \mathbf {v}_1, \ldots, A \mathbf {v}_r\)

因为 \(\Vert A \mathbf {v}_i \Vert^2 = \mathbf {v}_i^T A^T A \mathbf {v}_i = \sigma_i^2 \mathbf {v}_i^T \mathbf {v}_i = \sigma_i^2\)

\(AA^T\) 的单位特征向量为 : \[
\mathbf {u}_i = \cfrac {A \mathbf {v}_i}{\sigma_i}, i= 1, \ldots, r
\]
对于 \(1 \le i, j \le r\), \[
\mathbf {u}_i^T \mathbf {u}_j = \Big (\cfrac {A \mathbf {v}_i}{\sigma_i} \Big)^T \Big (\cfrac {A \mathbf {v}_i}{\sigma_i} \Big) = \cfrac {\sigma_j^2 \mathbf {v}_i^T \mathbf {v}_j } {\sigma_i \sigma_j} =
\begin{cases}
1, &\text{if } i = j \\
0, &\text{if } i \neq j
\end{cases}
\]
说明 \(\{ \mathbf{u}_{1}, \ldots, \mathbf{u}_r\}\) 构成 \(C(A)\) 的一组单位正交基底

结论:

  1. \(r = \mathrm {rank} A = \mathrm {rank} A^T = \mathrm {rank} (A^TA) = \mathrm {rank} (AA^T)\)
  2. \(C(A^T) = C(A^TA), C(A) = C(AA^T)\)
  3. \(N(A) = N(A^TA), N(A^T) = N(AA^T)\)
  4. \(A^TA\) 的特征值为 \(\sigma_1^2, \ldots, \sigma_n^2\) ,对应单位正交的特征向量 \(\mathbf {v}_1, \ldots, \mathbf {v}_n\)
  5. \(AA^T\) 的特征值为 \(\sigma_1^2, \ldots, \sigma_m^2\) , 对应单位正交的特征向量 \(\mathbf {u}_1, \ldots, \mathbf {u}_m\)
  6. \(A \mathbf {v}_i = \sigma_i \mathbf {u}_i, \sigma_i \gt 0, i = 1, \ldots, r\) , 且 \(A \mathbf {v}_i = 0, i = r+1, \ldots, n\)
  7. \(A^T \mathbf {u}_j = \sigma_j \mathbf {v}_j, \sigma_j \gt 0, j = 1, \ldots, r\)\(A^T \mathbf {u}_j = 0, j = r+1, \ldots, n\)

下图展示了如下的结论:

  • \(\{ \mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_r\}\) 是行空间 \(C(A^T)\) 的基底
  • \(\{ \mathbf{v}_{r+1}, \ldots, \mathbf{v}_n\}\) 是零空间 \(N(A)\) 的基底
  • \(\{ \mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_r\}\) 是列空间 \(C(A)\) 的基底
  • \(\{ \mathbf{u}_{r+1}, \ldots, \mathbf{u}_n\}\) 是 左零空间 \(N(A^T)\) 的基底

2. SVD 分解

根据上面的结论,\(m \times m\) 阶矩阵 \(U = \begin {bmatrix} \mathbf {u}_1 &\cdots & \mathbf {u}_m \end {bmatrix}\)\(n \times n\) 阶矩阵 \(V = \begin {bmatrix} \mathbf {v}_1 &\cdots & \mathbf {v}_n \end {bmatrix}\) 满足 \(U^T U = I_m\)\(V^T V = I_n\) \[
\begin {align}
AV &= A \begin {bmatrix} \mathbf {v}_1 &\cdots & \mathbf {v}_r & \mathbf {v}_{r+1} & \cdots & \mathbf {v}_n \end {bmatrix} \\
&= \begin {bmatrix} A\mathbf {v}_1 &\cdots & A\mathbf {v}_r & A\mathbf {v}_{r+1} & \cdots & A\mathbf {v}_n \end {bmatrix}\\
&= \begin {bmatrix} \sigma_1 \mathbf {u}_1 &\cdots & \sigma_r \mathbf {u}_r & \mathbf {0} & \cdots & \mathbf {0} \end {bmatrix} \\
&= \begin {bmatrix} \mathbf {u}_1 &\cdots & \mathbf {u}_m \end {bmatrix}
\begin {bmatrix} \sigma_1 &&& \vline & \\ & \ddots && \vline & 0 \\ && \sigma_r & \vline & \\ \hline & 0 && \vline &0 \end {bmatrix} \\
&= U \Sigma
\end {align}
\]
其中主对角元 \(\sigma_1, \ldots, \sigma_p\) 称为奇异值,\(p = \min \{ m, n\}\) ,故 \(\mathrm {rank}A\) 等于非零奇异值数 \[
A = U \Sigma V^T
\]
上式右边展开,则 SVD 分解也可以写成 秩-1 (rank-one) 矩阵之和: \[
A = \sigma_1 \mathbf {u}_1 \mathbf {v}_1^T + \cdots + \sigma_r \mathbf {u}_r \mathbf {v}_r^T
\]

发表评论

电子邮件地址不会被公开。 必填项已用*标注

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d 博主赞过: