In a big picture, a model selection in Gaussian process regression can categorized into two; one is a discrete selection of a proper kernel function and the other is a continuous optimization of selecting 'hyper-parameters' of a selected kernel function. In this post, we will focus on the latter model selection.
Optimizing the hyper-parameters can be solved by either maximizing a marginal likelihood or a leave-one-out cross-validation (LOO-CV) method.
1. Marginal Likelihood and its derivative with respect to a hyperparameter
2. Leave-one-out Likelihood and its derivative
Optimization itself can be done by gradient based algorithm or exhaustive search algorithm.
Result - marginal likelihood (exhaustive search)
Result - marginal likelihood (gradient descent
Code - main (marginal likelihood - exhaustive search)