class: center, middle, inverse, title-slide # A review of Generalized Linear Mixed Models ## TMB Training Session II ### Andrea Havron
NOAA Fisheries, OST --- layout: true .footnote[U.S. Department of Commerce | National Oceanic and Atmospheric Administration | National Marine Fisheries Service] <style type="text/css"> code.cpp{ font-size: 14px; } code.r{ font-size: 14px; } </style> --- # Where to get everything .column-60[ <img src="data:image/png;base64,#static/tmb_training_screenshot.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] .column-40[ All slides and code are at:<br> [NOAA-FIMS/TMB_training](https://github.com/NOAA-FIMS/TMB_training) <img src="data:image/png;base64,#static/SessionII_screenshot.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] --- # Session II Agenda <br> **Day 1**:<br> - Review of Latent Variable Models - AR1 Process and Sparsity - C++ Review <br> **Day 2**:<br> - Uncertainty and Bias Correction - Model Validation - Modular TMB --- # Latent Variable Models .p[ Data originate from a process with multiple sources of variability, not all of which are observable - Causes: - Unobserved variability caused by grouping structure - Unobserved spatial and/or temporal process - Statistical consequence: - Data are not iid - Includes: - mixed-effects models - random-effects models - hierarchical models - state-space models ] --- # Fixed versus Random Effects .pull-left[ <br> **y = swim speed** <br> `\(\eta = \beta0 + \beta_{1} temp + \beta_{tank}\)` <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Linear </th> <th style="text-align:left;"> Generalized.Linear </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> FE Model </td> <td style="text-align:left;"> `\(y \sim Normal(\eta, \sigma)\)` </td> <td style="text-align:left;"> `\(y \sim Gamma(1/CV, \eta CV)\)` </td> </tr> <tr> <td style="text-align:left;"> Mixed Model </td> <td style="text-align:left;"> `\(y|tank \sim Normal(\eta, \sigma_{y})\)` `\(tank \sim Normal(0, \sigma_{RE})\)` </td> <td style="text-align:left;"> `\(y|tank \sim Gamma(1/CV, \eta CV)\)` `\(tank \sim Normal(0, \sigma_{RE})\)` </td> </tr> </tbody> </table> ] .pull-right-narrow[ <img src="data:image/png;base64,#static/glm-glmm-table.png" width="60%" style="display: block; margin: auto auto auto 0;" /> ] --- ## Joint, Conditional, and Marginal Probability .pull-left[ <br> **Joint Probability**: `$$P(A\cap B) = P(A,B)$$` ] .pull-right[ ![](data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-5-1.png)<!-- --> ] --- ## Joint, Conditional, and Marginal Probability .pull-left[ <br> **Joint Probability**: `$$P(A\cap B) = P(A,B)$$`<br> **Marginal Probability**: `$$P(B)$$` ] .pull-right[ ![](data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] --- ## Joint, Conditional, and Marginal Probability .pull-left[ <br> **Joint Probability**: `$$P(A\cap B) = P(A,B)$$`<br> **Marginal Probability**: `$$P(B)$$`<br> **Conditional Probability**: `$$P(A|B)$$`<br> `$$P(A|B) = \frac{P(A,B)}{P(B)}$$` ] .pull-right[ ![](data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] --- # Continuous examples <br> .pull-left[ **Joint density** `$$f(A,B)$$` ] .pull-right[ <img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-8-1.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] --- # Continuous examples <br> .pull-left[ **Joint density** `$$f(A,B)$$` **Marginal density** `$$f(A)$$` ] .pull-right[ <img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] --- # Continuous examples <br> .pull-left[ **Joint density** `$$f(A,B)$$` **Marginal density** `$$f(B)$$` **Conditional density** `$$f(A|B=b) = \frac{f(A,B)}{f(B)}$$` ] .pull-right[ <img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] --- # Marginalization - Discrete <br> .pull-left[ `\begin{align} P(B) &= \sum^{k}_{i=1}P(A_{i},B)\\ &= \sum^{K}_{i=1}P(B|A_{i})P(A_{i}) \end{align}` ] .pull-right[ <img src="data:image/png;base64,#static/law-total-prob.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] --- # Marginalization - Continuous <br> .pull-left[ `\begin{align} f(B) &= \int_{\mathbb{R}} f(A,B)dA\\ &\\ &= \int_{\mathbb{R}}f(B|A)f(A)dA \end{align}` ] .pull-right[ <img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-12-1.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] --- # Likelihood of a Hierarchical Model Data: `\(y\)`<br> Random effect: `\(u\)`<br> Parameters: `\(\Theta = (\theta_{y}, \theta_{u})\)` .pull-left[ **Bayesian Inference** `\begin{align} L(\Theta|y) &= \frac{f(y,u,\Theta)}{f(y)}\\ L(\Theta|y) &= \frac{f(y|u,\theta_{y})f(u|\theta_{u})f(\Theta)}{f(y)}\\ &\\ &= \frac{f(y|u,\theta_{y})f(u|\theta_{u})f(\Theta)}{\int_{\mathbb{R}}f(y,u,\Theta)dud\Theta} \end{align}` - Posterior density of `\(\Theta\)` ] .pull-right[ **MLE Inference** `\begin{align} L(\Theta) &= \int_{\mathbb{R}}f(y,u;\Theta)du\\ &\\ &= \int_{\mathbb{R}}f(y|u; \theta_{y})f(u;\theta_{u})du \end{align}` <br><br> - Point estimate of `\(\Theta\)` with confidence intervals ] --- # Likelihood of a Hierarchical Model Data: `\(y\)`<br> Random effect: `\(u\)`<br> Parameters: `\(\Theta = (\theta_{y}, \theta_{u})\)` .pull-left[ **Bayesian Inference** `$$L(\Theta|y) = \frac{f(y,u,\Theta)}{f(y)}$$` - Exact - Conjugate Priors - Numerical Integration - Approximations: - MCMC Simulations - Metropolis Hastings, Gibbs Sampler, Hamiltonian Monte Carlo - Integrated Nested Laplace Approximation (INLA) ] .pull-right[ **MLE Inference** `$$L(\Theta) = \int_{\mathbb{R}}f(y,u;\Theta)du$$` - Exact - Newton-Rhapsom Algorithm - Expectation-Maximization Algorithm - Numerical integration - Approximations: - Penalized Likelihood - Laplace approximation - Gauss-Hermite quadrature ] --- # Likelihood of a Hierarchical Model .pull-left[ **Bayesian Inference** `$$L(\Theta|y) = \frac{f(y,u,\Theta)}{f(y)}$$` ] .pull-right[ **MLE Inference** `$$L(\Theta) = \int_{\mathbb{R}}f(y,u;\Theta)du$$` ]<br> <img src="data:image/png;base64,#static/laplace-accuracy.png" width="50%" style="display: block; margin: auto;" /> <br> Figure from [Albertsen, C. M. (2018), 2.3.1](https://backend.orbit.dtu.dk/ws/portalfiles/portal/157133664/Publishers_version.pdf) ] --- # The Laplace approximation<br> Changes the problem from integration to optimization <br> The marginal likelihood of the parameters after integrating out random effects, `\(u\)`: `$$L(\theta) = \int_{\mathbb{R}}f(y,u;\theta)f(u;\theta)du$$` 1. The random effects likelihood is minimized to find `\(\hat{u}\)`: `$$\hat{u}(\theta) = \underset{u}{argmin}f(u,\theta)$$` 2. The Hessian (matrix of partial 2nd derivatives) of `\(f(y,u|\theta)\)` w.r.t. `\(u\)` is evaluated at `\(\hat{u}(\theta)\)`: `$$\mathbb{H}(\theta) = f^{"}_{uu}(\hat{u}(\theta), \theta)$$` 3. The Laplace approximation for the marginal likelihood is: `$$L^{*}(\theta) = \sqrt{2\pi}^{n}det(\mathbb{H})^{-1/2}f(y,\hat{u}; \theta)$$` --- # Separability If `\(y_{i}\)` is only dependent on `\(u_{i}\)`, then the likelihood function is **separable** `\begin{align} L(\theta) &= \int_{\mathbb{R}^{n}}f(y,u;\theta)du\\ &= \prod^{n}_{i=1}\int f(y_{i}, u_{i};\theta)du_{i} \end{align}` The marginal likelihood can be evaluated as a series of univariate Laplace approximations: `\begin{align} L^{*}(\theta) &= \sqrt{2\pi}^{n}det(\mathbb{H})^{-1/2}f(y,\hat{u}; \theta)\\ &\\ &= \prod^{n}_{i=1}f(y_{i}, \hat{u}_{i};\theta)\sqrt{\frac{2\pi}{(-f''(y_{i},\hat{u_{i}};\theta))^{-1}}} \end{align}` Under separability, the Hessian, `\(\mathbb{H}\)`, is sparse --- # Sparsity .pull-left[ Theta logistic state-space model: `\begin{align} u_{t} &= u_{t-1} + r_{0}\bigg(1 - \Big(\frac{exp(u_{t-1})}{K}\Big)^{\psi}\bigg) \\ &\\ u_{t} &\sim N(0, \sigma_{u})\\ y_{t} &\sim N(u_{t}, \sigma_{y}) \end{align}` ```r TMB::runExample("thetalog", exfolder = "../../adcomp/tmb_examples") obj <- MakeADFun(data, parameters, random=c("X"), DLL="thetalog") Matrix::image(obj$env$spHess(random=TRUE)) ``` <br> Example from [TMB's comprehensive documentation](https://kaskr.github.io/adcomp/_book/Sparsity.html#the-theta-logistic-example) gitbook ] .pull-right[ <img src="data:image/png;base64,#static/state-space-dag.png" width="65%" style="display: block; margin: auto;" /> <img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-16-1.png" width="65%" style="display: block; margin: auto;" /> ] --- # Implementing Sparsity in TMB <br> .pull-left[ - Dependency on Eigen (C++ library) for sparse data types - Automatically detects sparsity in the Hessian - 'Optimizes' the computational graph based on sparsity patterns - A sparse Cholesky factorization is used to evaluate the sparse Hessian ] .pull-right[ **Bayesian Sparsity Comparison**<br> Sparse spatial model run time (min) |Sample Size|tmbstan|Stan| |-----------|-------|----| | 32 |5.9 |1072| | 64 |7.5 |1716| | 128 |8.7 | | ] --- # Implementing Sparsity in TMB <br> **Data Types** - [**SparseMatrix**](https://eigen.tuxfamily.org/dox/group__TutorialSparse.html) - Declares `\(\text{Eigen::SparseMatrix<Type>}\)` from the Eigen sparse module **TMB macro** - [**DATA_SPARSE_MATRIX**](https://kaskr.github.io/adcomp/group__macros.html#ga75dbde3f78d762602b7ffc0f19a99e1e) - Get sparse matrix from R and declare it as `\(\text{Eigen::SparseMatrix<Type>}\)` --- # Implementing Sparsity in TMB - Classes are specified in the density namespace - Note NEGATIVE log-likelihood is returned **Density functions** <br> [**AR1**](https://kaskr.github.io/adcomp/classdensity_1_1AR1__t.html) `\(\quad\quad\quad\quad~~\)` `\(\text{class density::AR1_t<distribution>}\)` - Class to evaluate the negative log density of a (multivariate) AR1 process with parameter phi and given marginal distribution. [**GMRF**](https://kaskr.github.io/adcomp/classdensity_1_1GMRF__t.html) `\(\quad\quad\quad~~\)` `\(\text{class density::GMRF_t<distribution>}\)` - Class to evaluate the negative log density of a mean zero multivariate normal distribution with a sparse precision matrix [**SEPARABLE**](https://kaskr.github.io/adcomp/classdensity_1_1SEPARABLE__t.html) `\(\quad\)` `\(\text{density::SEPARABLE_t<distribution1, distribution2>}\)` - Take two densities f and g, and construct the density of their separable extension --- # Applications in Fisheries [WHAM](https://github.com/timjmiller/wham/blob/master/src/wham_v0.cpp) examples [Multivariate random effect for mortality, age by year, using 2D AR1 density:](https://github.com/timjmiller/wham/blob/97577f1d619607ec40fdb8d3097f30668af16999/src/wham_v0.cpp#L483) ```cpp PARAMETER_ARRAY(M_re); // random effects for year- and age-varying M deviations from mean M_a), dim = n_years x n_M_a ... nll_M += SCALE(SEPARABLE(AR1(rho_M_a),AR1(rho_M_y)), Sigma_M)(M_re); ``` [Multivariate random effect for numbers at age, age by year, using 2D AR1 density:](https://github.com/timjmiller/wham/blob/97577f1d619607ec40fdb8d3097f30668af16999/src/wham_v0.cpp#L934) ```cpp array<Type> NAA_devs(n_years_model+n_years_proj-1, n_ages); ... nll_NAA += SEPARABLE(VECSCALE(AR1(NAA_rho_a), sigma_a_sig),AR1(NAA_rho_y))(NAA_devs); ```