Following properties about matrix inversion is extermely important in deriving derivatives in parameter estimation.
1. Gradient of Matrix Inversion
$$\frac{\partial}{\partial \theta} K^{-1} = K^{-1} \frac{\partial K}{\partial \theta} K^{-1}$$
where '$\frac{\partial K}{\partial \theta}$' is a matrix of elementwise derivatives.
2. Gradient of Log-Determinent of a Matrix
$$ \frac{\partial}{\partial \theta} log |K| = tr(K^{-1} \frac{\partial K}{\partial \theta}) $$
3. Computing the inverse of submatrix given the inverse of the Matrix
Let
$$ [\mathbf{A} ~ \mathbf{b} ~ ; ~ \mathbf{c}^T ~ d]^{-1} = [\mathbf{E} ~ \mathbf{f} ~ ; ~ \mathbf{g}^T ~ h] $$
then,
$$ \mathbf{A}^{-1} = \mathbf{E} - \frac{\mathbf{f}^T \mathbf{g} }{h} $$
*Why? (http://math.stackexchange.com/questions/208001/are-there-any-decompositions-of-a-symmetric-matrix-that-allow-for-the-inversion/208021#208021)
In particular, above-mentioned property can be applied to leave-one-out (LOO) parameter optimizations.
Let
$$ \mathbf{K}_{\mathbf{XX}} = [\mathbf{K}_{\mathbf{X}_{i-1} \mathbf{X}_{i-1}} ~ \mathbf{k}(\mathbf{X}_{-i}, \mathbf{x}_{i}) ~ ; ~ \mathbf{k}(\mathbf{X}_{-i}, \mathbf{x}_{i})^T ~ k(\mathbf{x}_i, \mathbf{x}_i) ] $$
and
$$ \mathbf{K}_{\mathbf{XX}}^{-1} = [\mathbf{E} ~ \mathbf{f} ~;~ \mathbf{g}^T ; h] $$
where
$$ \mathbf{K}_{\mathbf{X}_{i-1} \mathbf{X}_{i-1}} = \mathbf{k}(\mathbf{X}_{i-1}, \mathbf{X}_{i-1}). $$
Then
$$ \mathbf{K}_{\mathbf{X}_{i-1} \mathbf{X}_{i-1}} ^ {-1} = \mathbf{E} - \frac{\mathbf{f}^T \mathbf{g}}{h} $$
which states
$$ K_{X_{1:n-i}X_{1:n-i}}^{-1} = V_{1:n-i, ~1:n-i} - \frac{V_{1:n-i,~i}^T V_{i:n-i, ~i}}{V_{i, ~i}} $$
where '$V = K_{X_{1:n}X_{1:n}}^{-1}.$'
'Enginius > Machine Learning' 카테고리의 다른 글
Topologoical Data Analysis (0) | 2014.11.21 |
---|---|
Latent Dirichlet Allocation (LDA) with Matlab Code (0) | 2014.11.12 |
Bayesian과 MCMC 알고리즘 (Gibbs and Metropolis-Hastings Algorithm) (6) | 2014.10.29 |
Dirichlet Processes and Hierarchical Dirichlet Processes (1) | 2014.10.28 |
Scaling Up Deep Learning - Yoshua Bengio (0) | 2014.10.21 |