# matrix derivative 矩阵偏导

## 1. 矩阵偏导表示

$\begin{array}{c|lcr} \text {Type} & \text{Scalar} & \text{Vector} & \text{Matrix} \\ \hline \text{Scalar} & \cfrac {\partial y} {\partial x} & \cfrac {\partial \mathbf {y}} {\partial x} & \cfrac {\partial \mathbf{Y}} {\partial x} \\ \text{Vector} & \cfrac {\partial y} {\partial \mathbf {x}} & \cfrac {\partial \mathbf {y}} {\partial \mathbf {x}} \\ \text{Matrix} & \cfrac {\partial y} {\partial \mathbf {X}} \\ \end{array}$

$$x$$ 为 scalar 的情形: $\begin{array}{cc} \text{numerator layout} & \text{denominator layout} \\ \hline \cfrac {\partial y} {\partial x} & \cfrac {\partial y} {\partial x} \\ \cfrac {\partial \mathbf {y}} {\partial x} = \begin {bmatrix} \cfrac {\partial y_1} {\partial x} \\ \vdots \\\cfrac {\partial y_m} {\partial x} \end {bmatrix} & \cfrac {\partial \mathbf {y}} {\partial x} = \begin {bmatrix} \cfrac {\partial y_1} {\partial x} & \cdots & \cfrac {\partial y_m} {\partial x} \end {bmatrix} = \cfrac {\partial \mathbf {y}^T} {\partial x}\\ \cfrac {\partial \mathbf {Y}} {\partial x} = \begin {bmatrix} \cfrac {\partial y_{11}} {\partial x} & \cdots & \cfrac {\partial y_{1n}} {\partial x} \\ \vdots & \ddots & \vdots \\ \cfrac {\partial y_{m1}} {\partial x} & \cdots & \cfrac {\partial y_{mn}} {\partial x} \end {bmatrix} \end {array}$

$$\mathbf {x}$$ 是 vector 的情形: $\begin{array}{cc} \text{numerator layout} & \text{denominator layout} \\ \hline \cfrac {\partial y} {\partial \mathbf {x}} = \begin {bmatrix} \cfrac {\partial y} {\partial x_1} & \cdots & \cfrac {\partial y} {\partial x_n} \end {bmatrix} = \cfrac {\partial y} {\partial \mathbf {x}^T} & \cfrac {\partial y} {\partial \mathbf {x}} = \begin {bmatrix} \cfrac {\partial y} {\partial x_1} \\ \vdots \\\cfrac {\partial y} {\partial x_n} \end {bmatrix} \\ \cfrac {\partial \mathbf {y}} {\partial \mathbf {x}} = \begin {bmatrix} \cfrac {\partial y_{1}} {\partial x_1} & \cdots & \cfrac {\partial y_{1}} {\partial x_n} \\ \vdots & \ddots & \vdots \\ \cfrac {\partial y_{m}} {\partial x_1} & \cdots & \cfrac {\partial y_{m}} {\partial x_n} \end {bmatrix} & \cfrac {\partial \mathbf {y}} {\partial \mathbf {x}} = \begin {bmatrix} \cfrac {\partial y_{1}} {\partial x_1} & \cdots & \cfrac {\partial y_{m}} {\partial x_1} \\ \vdots & \ddots & \vdots \\ \cfrac {\partial y_{1}} {\partial x_n} & \cdots & \cfrac {\partial y_{m}} {\partial x_n} \end {bmatrix} \\ \equiv \cfrac {\partial \mathbf {y}} {\partial \mathbf {x}^T} & \equiv \cfrac {\partial \mathbf {y}^T} {\partial \mathbf {x}} \end {array}$
$$\mathbf {X}$$ 是 matrix 的情形: $\begin{array}{cc} \text{numerator layout} & \text{denominator layout} \\ \hline \cfrac {\partial y} {\partial \mathbf {X}} = \begin {bmatrix} \cfrac {\partial y} {\partial x_{11}} & \cdots & \cfrac {\partial y} {\partial x_{m1}} \\ \vdots & \ddots & \vdots \\ \cfrac {\partial y} {\partial x_{1n}} & \cdots & \cfrac {\partial y} {\partial x_{mn}} \end {bmatrix} & \cfrac {\partial y} {\partial \mathbf {X}} = \begin {bmatrix} \cfrac {\partial y} {\partial x_{11}} & \cdots & \cfrac {\partial y} {\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \cfrac {\partial y} {\partial x_{m1}} & \cdots & \cfrac {\partial y} {\partial x_{mn}} \end {bmatrix} \\ \equiv \cfrac {\partial y} {\partial \mathbf {X}^T} & \equiv \cfrac {\partial y} {\partial \mathbf {X}} \end {array}$

## 2. 矩阵导数公式

### 2.1 标量(scalar) 导数规则

1. $$\cfrac {\partial (u+v)} {\partial x} = \cfrac {\partial u} {\partial x} + \cfrac {\partial v} {\partial x}$$

2. $$\cfrac {\partial uv} {\partial x} = u\cfrac {\partial v} {\partial x} + v\cfrac {\partial u} {\partial x}$$
证明请用导数的定义，求 $\lim_{h \to 0} \cfrac {u(x+h) v(x+h) – u(x) v(x)} {h}$

3. $$\cfrac {\partial g(u)} {\partial x} = \cfrac {\partial g(u)} {\partial u} \cfrac {\partial u} {\partial x}$$ (chain rule)

4. $$\cfrac {\partial f(g(u))} {\partial x} = \cfrac {\partial f(g)} {\partial g} \cfrac {\partial g(u)} {\partial u} \cfrac {\partial u} {\partial x}$$ (chain rule)

### 2.2 no function relations

scalar $$a$$, vector $$\mathbf {a}$$ and matrix $$\mathbf {A}$$ are not functions of $$x, \mathbf {x}, \mathbf {X}$$

1. $$\cfrac {d \mathbf {a}} {dx} = \mathbf {0}$$ (column vector)
2. $$\cfrac {da} {d \mathbf {x}} = \mathbf {0}^T$$ (row vector)
3. $$\cfrac {da} {d \mathbf {X}} = \mathbf {0}^T$$ (matrix transpose)
4. $$\cfrac {d \mathbf {a}} {d \mathbf {x}} = \mathbf {0}$$

### 2.3 derivatives of vector by scalar

1. $$\cfrac {\partial a \mathbf {u}} {\partial x} = a \cfrac {\partial \mathbf {u}} {\partial x}$$, where $$a$$ is not a function of $$x$$

2. $$\cfrac {\partial \mathbf {A} \mathbf {u}} {\partial x} = \mathbf {A} \cfrac {\partial \mathbf {u}} {\partial x}$$, where $$\mathbf {A}$$ is not a function of $$x$$

3. $$\cfrac {\partial \mathbf {u}^T} {\partial x} = (\cfrac {\partial \mathbf {u}} {\partial x})^T$$

4. $$\cfrac {\partial (\mathbf {u} + \mathbf {v})} {\partial x} = \cfrac {\partial \mathbf {u}} {\partial x} + \cfrac {\partial \mathbf {v}} {\partial x}$$

5. $$\cfrac {\partial \mathbf {g(u)}} {\partial x} = \cfrac {\partial \mathbf {g(u)}} {\partial \mathbf {u}} \cfrac {\partial \mathbf {u}} {\partial x}$$ (chain rule)
with consistent matrix

6. $$\cfrac {\partial \mathbf {f(g(u))}} {\partial x} = \cfrac {\partial \mathbf {f(g)}} {\partial \mathbf {g}} \cfrac {\partial \mathbf {g(u)}} {\partial \mathbf {u}} \cfrac {\partial \mathbf {u}} {\partial x}$$

### 2.4 derivative of matrix by scalar

1. $$\cfrac {\partial a \mathbf {U}} {\partial x} = a \cfrac {\partial \mathbf {U}} {\partial x}$$ , where $$a$$ is not a function of $$x$$
2. $$\cfrac {\partial \mathbf {AUB}} {\partial x} = \mathbf {A} \cfrac {\partial \mathbf {U}} {\partial x} \mathbf {B}$$ , where $$\mathbf {A}$$ and $$\mathbf {B}$$ are not function of $$x$$
3. $$\cfrac {\partial (\mathbf {U} + \mathbf {V})} {\partial x} = \cfrac {\partial \mathbf {U}} {\partial x} + \cfrac {\partial \mathbf {V}} {\partial x}$$
4. $$\cfrac {\partial \mathbf {UV}} {\partial x} =\mathbf {U} \cfrac {\partial \mathbf {V}} {\partial x} + \cfrac {\partial \mathbf {U}} {\partial x}\mathbf {V}$$ (product rule)

### 2.5 derivatives of scalar by vector

1. $$\cfrac {\partial a u} {\partial \mathbf {x}} = a \cfrac {\partial u} {\partial \mathbf {x}}$$, where $$a$$ is not a function of $$\mathbf {x}$$

2. $$\cfrac {\partial (u + v)} {\partial \mathbf {x}} = \cfrac {\partial u} {\partial \mathbf {x}} + \cfrac {\partial v} {\partial \mathbf {x}}$$

3. $$\cfrac {\partial uv} {\partial \mathbf {x}} = u \cfrac {\partial v} {\partial \mathbf {x}} + v \cfrac {\partial u} {\partial \mathbf {x}}$$ (product rule)

4. $$\cfrac {\partial g(u)} {\partial \mathbf {x}} = \cfrac {\partial g(u)} {\partial u} \cfrac {\partial u} {\partial \mathbf {x}}$$ (chain rule)

5. $$\cfrac {\partial f(g(u))} {\partial \mathbf {x}} = \cfrac {\partial f(g)} {\partial g}\cfrac {\partial g(u)} {\partial u} \cfrac {\partial u} {\partial \mathbf {x}}$$ (chain rule)

6. $$\cfrac {\partial \mathbf {u}^T \mathbf {v}} {\partial \mathbf {x}} =\mathbf {u}^T \cfrac {\partial \mathbf {v}} {\partial \mathbf {x}} + \mathbf {v}^T\cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$ (product rule)
这是最重要的公式，没有之一，其他的都可以根据它推出来
where $$\cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$ and $$\cfrac {\partial \mathbf {v}} {\partial \mathbf {x}}$$ are in numerator layout

7. $$\cfrac {\partial \mathbf {u}^T \mathbf {Av}} {\partial \mathbf {x}} =\mathbf {u}^T \mathbf {A} \cfrac {\partial \mathbf {v}} {\partial \mathbf {x}} + \mathbf {v}^T \mathbf {A}^T\cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$ (product rule)
where $$\cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$ and $$\cfrac {\partial \mathbf {v}} {\partial \mathbf {x}}$$ are in numerator layout
and $$\mathbf {A}$$ is not a function of $$\mathbf {x}$$
可以视为 $$\cfrac {\partial \mathbf {u}^T \mathbf {(Av)}} {\partial \mathbf {x}} = \mathbf {u}^T \cfrac {\partial \mathbf {Av}} {\partial \mathbf {x}} + (\mathbf {Av})^T\cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$

### 2.6 derivative of scalar by matrix

1. $$\cfrac {\partial a u} {\partial \mathbf {X}} = a \cfrac {\partial u} {\partial \mathbf {X}}$$, where $$a$$ is not a function of $$\mathbf {X}$$
2. $$\cfrac {\partial (u + v)} {\partial \mathbf {X}} = \cfrac {\partial u} {\partial \mathbf {X}} + \cfrac {\partial v} {\partial \mathbf {X}}$$
3. $$\cfrac {\partial uv} {\partial \mathbf {X}} = u \cfrac {\partial v} {\partial \mathbf {X}} + v \cfrac {\partial u} {\partial \mathbf {X}}$$ (product rule)
4. $$\cfrac {\partial g(u)} {\partial \mathbf {X}} = \cfrac {\partial g(u)} {\partial u} \cfrac {\partial u} {\partial \mathbf {X}}$$ (chain rule)
5. $$\cfrac {\partial f(g(u))} {\partial \mathbf {X}} = \cfrac {\partial f(g)} {\partial g}\cfrac {\partial g(u)} {\partial u} \cfrac {\partial u} {\partial \mathbf {X}}$$ (chain rule)

### 2.7 derivative of vector by vector

1. $$\cfrac {\partial a \mathbf {u}} {\partial \mathbf {x}} = a \cfrac {\partial \mathbf {u}} {\partial \mathbf {x}} + \mathbf {u} \cfrac {\partial a } {\partial \mathbf {x}}$$ (product rule)
2. $$\cfrac {\partial (\mathbf {u} + \mathbf {v})} {\partial \mathbf {x}} = \cfrac {\partial \mathbf {u}} {\partial \mathbf {x}} + \cfrac {\partial \mathbf {v}} {\partial \mathbf {x}}$$
3. $$\cfrac {\partial \mathbf {Au}} {\partial \mathbf {x}} = \mathbf {A} \cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$ (product rule)
4. $$\cfrac {\partial \mathbf {g(u)}} {\partial \mathbf {x}} = \cfrac {\partial \mathbf {g(u)}} {\partial \mathbf {u}} \cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$ (chain rule)
5. $$\cfrac {\partial \mathbf {f(g(u))}} {\partial \mathbf {x}} = \cfrac {\partial \mathbf {f(g)}} {\partial \mathbf {g}}\cfrac {\partial \mathbf {g(u)}} {\partial \mathbf {u}} \cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$ (chain rule)

### 2.8 一些常用的偏导

1. $$\cfrac {d \mathbf {x}} {d \mathbf {x}} = \mathbf {I}$$

2. $$\cfrac {d \mathbf {a}^T \mathbf {x}} {d \mathbf {x}} = \cfrac {d \mathbf {x}^T \mathbf {a}} {d \mathbf {x}} =\mathbf {a}^T$$
证明过程可用 $$\cfrac {\partial \mathbf {u}^T \mathbf {v}} {\partial \mathbf {x}} =\mathbf {u}^T \cfrac {\partial \mathbf {v}} {\partial \mathbf {x}} + \mathbf {v}^T\cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$

3. $$\cfrac {d (\mathbf {x}^T \mathbf {a})^2} {d \mathbf {x}} =2 \mathbf {x}^T \mathbf {a}\mathbf {a}^T$$
用链式法则和矩阵公式

4. $$\cfrac {d \mathbf {x}^T \mathbf {x}} {d \mathbf {x}} = 2 \mathbf {x}^T$$
证明过程和前面相似
可以用特殊例子记一下:
$$s = \mathbf {x}^T \mathbf {x} = \sum_i x_i^2$$ , 则 $$\cfrac {\partial s} {\partial x_i} = 2x_i$$ , 于是 $$\cfrac {d s} {d \mathbf {x}} = 2 \mathbf {x}^T$$

5. $$\cfrac {d \mathbf {A}\mathbf {x}} {d \mathbf {x}} = \mathbf {A}$$
证明过程可以用 $$\cfrac {\partial \mathbf {Au}} {\partial \mathbf {x}} = \mathbf {A} \cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$

6. $$\cfrac {d \mathbf {x}^T \mathbf {A}\mathbf {x}} {d \mathbf {x}} = 2 \mathbf {x}^T (\mathbf {A}+\mathbf {A}^T)$$
还是用乘法法则 $$\cfrac {\partial \mathbf {u}^T \mathbf {Av}} {\partial \mathbf {x}} =\mathbf {u}^T \mathbf {A} \cfrac {\partial \mathbf {v}} {\partial \mathbf {x}} + \mathbf {v}^T \mathbf {A}^T\cfrac {\partial \mathbf {u}} {\partial \mathbf {x}}$$

