A review of Generalized Linear Mixed Models

class: center, middle, inverse, title-slide

# A review of Generalized Linear Mixed Models
## TMB Training Session II
### Andrea Havron<br>NOAA Fisheries, OST

---

layout: true

.footnote[U.S. Department of Commerce | National Oceanic and Atmospheric Administration | National Marine Fisheries Service]

code.cpp{
  font-size: 14px;
}
code.r{
  font-size: 14px;
}

</style>

---
# Where to get everything

.column-60[
<img src="data:image/png;base64,#static/tmb_training_screenshot.png" width="90%" style="display: block; margin: auto auto auto 0;" />
]

.column-40[
All slides and code are at:<br> [NOAA-FIMS/TMB_training](https://github.com/NOAA-FIMS/TMB_training)
<img src="data:image/png;base64,#static/SessionII_screenshot.png" width="90%" style="display: block; margin: auto auto auto 0;" />
]
---
# Session II Agenda
<br>

**Day 1**:<br>
- Review of Latent Variable Models
- AR1 Process and Sparsity
- C++ Review
<br>

**Day 2**:<br>
- Uncertainty and Bias Correction
- Model Validation
- Modular TMB

---
# Latent Variable Models

.p[
Data originate from a process with multiple sources of variability, not all of which are observable
- Causes:
  - Unobserved variability caused by grouping structure
  - Unobserved spatial and/or temporal process
- Statistical consequence: 
  - Data are not iid
- Includes: 
  - mixed-effects models
  - random-effects models
  - hierarchical models
  - state-space models
]

---
# Fixed versus Random Effects

.pull-left[
<br>
**y = swim speed** <br>
`$\eta = \beta0 + \beta_{1} temp + \beta_{tank}$`
<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;"> Linear </th>
   <th style="text-align:left;"> Generalized.Linear </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> FE Model </td>
   <td style="text-align:left;"> `$y \sim Normal(\eta, \sigma)$` </td>
   <td style="text-align:left;"> `$y \sim Gamma(1/CV, \eta CV)$` </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Mixed Model </td>
   <td style="text-align:left;"> `$y|tank \sim Normal(\eta, \sigma_{y})$` `$tank \sim Normal(0, \sigma_{RE})$` </td>
   <td style="text-align:left;"> `$y|tank \sim Gamma(1/CV, \eta CV)$` `$tank \sim Normal(0, \sigma_{RE})$` </td>
  </tr>
</tbody>
</table>
]

.pull-right-narrow[
<img src="data:image/png;base64,#static/glm-glmm-table.png" width="60%" style="display: block; margin: auto auto auto 0;" />
]
---
## Joint, Conditional, and Marginal Probability
.pull-left[
<br>
**Joint Probability**: `$$P(A\cap B) = P(A,B)$$`

]

.pull-right[
![](data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-5-1.png)
]

---
## Joint, Conditional, and Marginal Probability
.pull-left[
<br>
**Joint Probability**: `$$P(A\cap B) = P(A,B)$$`<br>

**Marginal Probability**: `$$P(B)$$`

]

.pull-right[
![](data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-6-1.png)
]

---
## Joint, Conditional, and Marginal Probability
.pull-left[
<br>
**Joint Probability**: `$$P(A\cap B) = P(A,B)$$`<br>

**Marginal Probability**: `$$P(B)$$`<br>

**Conditional Probability**: `$$P(A|B)$$`<br>

`$$P(A|B) = \frac{P(A,B)}{P(B)}$$`

]

.pull-right[
![](data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-7-1.png)
]
---
# Continuous examples
<br>

.pull-left[
**Joint density** `$$f(A,B)$$`
]

.pull-right[
<img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-8-1.png" width="90%" style="display: block; margin: auto auto auto 0;" />
]

---
# Continuous examples
<br>

.pull-left[
**Joint density** `$$f(A,B)$$`
**Marginal density** `$$f(A)$$`
]

.pull-right[
<img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto auto auto 0;" />
]

---
# Continuous examples
<br>

.pull-left[
**Joint density** `$$f(A,B)$$`
**Marginal density** `$$f(B)$$`
**Conditional density**
`$$f(A|B=b) = \frac{f(A,B)}{f(B)}$$`
]

.pull-right[
<img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto auto auto 0;" />
]
---
# Marginalization - Discrete
<br>
.pull-left[
`\begin{align}
P(B) &= \sum^{k}_{i=1}P(A_{i},B)\\
&= \sum^{K}_{i=1}P(B|A_{i})P(A_{i})
\end{align}`
]

.pull-right[
<img src="data:image/png;base64,#static/law-total-prob.png" width="90%" style="display: block; margin: auto auto auto 0;" />
]
---
# Marginalization - Continuous
<br>
.pull-left[
`\begin{align}
f(B) &= \int_{\mathbb{R}} f(A,B)dA\\
&\\
&= \int_{\mathbb{R}}f(B|A)f(A)dA
\end{align}`
]

.pull-right[
<img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-12-1.png" width="90%" style="display: block; margin: auto auto auto 0;" />
]
---

# Likelihood of a Hierarchical Model

Data: `$y$`<br>
Random effect: `$u$`<br>
Parameters: `$\Theta = (\theta_{y}, \theta_{u})$`

.pull-left[
**Bayesian Inference**
`\begin{align}
L(\Theta|y) &= \frac{f(y,u,\Theta)}{f(y)}\\
L(\Theta|y) &= \frac{f(y|u,\theta_{y})f(u|\theta_{u})f(\Theta)}{f(y)}\\
&\\
&= \frac{f(y|u,\theta_{y})f(u|\theta_{u})f(\Theta)}{\int_{\mathbb{R}}f(y,u,\Theta)dud\Theta}
\end{align}`

- Posterior density of `$\Theta$`
]

.pull-right[
**MLE Inference** 
`\begin{align}
L(\Theta) &= \int_{\mathbb{R}}f(y,u;\Theta)du\\
&\\
&= \int_{\mathbb{R}}f(y|u; \theta_{y})f(u;\theta_{u})du
\end{align}`
<br><br>

- Point estimate of `$\Theta$` with confidence intervals
]

---

# Likelihood of a Hierarchical Model

Data: `$y$`<br>
Random effect: `$u$`<br>
Parameters: `$\Theta = (\theta_{y}, \theta_{u})$`

.pull-left[
**Bayesian Inference**
`$$L(\Theta|y) = \frac{f(y,u,\Theta)}{f(y)}$$`

- Exact
 - Conjugate Priors
 - Numerical Integration
- Approximations:
 - MCMC Simulations - Metropolis Hastings, Gibbs Sampler, Hamiltonian Monte Carlo
 - Integrated Nested Laplace Approximation (INLA)
]

.pull-right[
**MLE Inference** 
`$$L(\Theta) = \int_{\mathbb{R}}f(y,u;\Theta)du$$`

- Exact
 - Newton-Rhapsom Algorithm
 - Expectation-Maximization Algorithm
 - Numerical integration
- Approximations:
 - Penalized Likelihood
 - Laplace approximation
 - Gauss-Hermite quadrature

]

---
# Likelihood of a Hierarchical Model

.pull-left[
**Bayesian Inference**
`$$L(\Theta|y) = \frac{f(y,u,\Theta)}{f(y)}$$`
]

.pull-right[
**MLE Inference** 
`$$L(\Theta) = \int_{\mathbb{R}}f(y,u;\Theta)du$$`

]<br>
<img src="data:image/png;base64,#static/laplace-accuracy.png" width="50%" style="display: block; margin: auto;" />
<br>
Figure from [Albertsen, C. M. (2018), 2.3.1](https://backend.orbit.dtu.dk/ws/portalfiles/portal/157133664/Publishers_version.pdf)
]
---
# The Laplace approximation<br>

Changes the problem from integration to optimization <br>

The marginal likelihood of the parameters after integrating out random effects, `$u$`: 
`$$L(\theta) = \int_{\mathbb{R}}f(y,u;\theta)f(u;\theta)du$$`
1. The random effects likelihood is minimized to find `$\hat{u}$`:
`$$\hat{u}(\theta) = \underset{u}{argmin}f(u,\theta)$$`
2. The Hessian (matrix of partial 2nd derivatives) of `$f(y,u|\theta)$` w.r.t. `$u$` is evaluated at `$\hat{u}(\theta)$`:
`$$\mathbb{H}(\theta) = f^{"}_{uu}(\hat{u}(\theta), \theta)$$`
3. The Laplace approximation for the marginal likelihood is:
`$$L^{*}(\theta) = \sqrt{2\pi}^{n}det(\mathbb{H})^{-1/2}f(y,\hat{u}; \theta)$$`
---
# Separability

If `$y_{i}$` is only dependent on `$u_{i}$`, then the likelihood function is **separable**

`\begin{align}
L(\theta) &= \int_{\mathbb{R}^{n}}f(y,u;\theta)du\\
&= \prod^{n}_{i=1}\int f(y_{i}, u_{i};\theta)du_{i}
\end{align}`

The marginal likelihood can be evaluated as a series of univariate Laplace approximations:
`\begin{align}
L^{*}(\theta) &= \sqrt{2\pi}^{n}det(\mathbb{H})^{-1/2}f(y,\hat{u}; \theta)\\
&\\
&= \prod^{n}_{i=1}f(y_{i}, \hat{u}_{i};\theta)\sqrt{\frac{2\pi}{(-f''(y_{i},\hat{u_{i}};\theta))^{-1}}}
\end{align}`

Under separability, the Hessian, `$\mathbb{H}$`, is sparse
---
# Sparsity

.pull-left[

Theta logistic state-space model:
`\begin{align}
u_{t} &= u_{t-1} + r_{0}\bigg(1 - \Big(\frac{exp(u_{t-1})}{K}\Big)^{\psi}\bigg) \\
&\\
u_{t} &\sim N(0, \sigma_{u})\\
y_{t} &\sim N(u_{t}, \sigma_{y})
\end{align}`

```r
TMB::runExample("thetalog", 
                exfolder = "../../adcomp/tmb_examples")
obj <- MakeADFun(data, parameters, random=c("X"), 
                 DLL="thetalog")
Matrix::image(obj$env$spHess(random=TRUE))
```
<br>
Example from [TMB's comprehensive documentation](https://kaskr.github.io/adcomp/_book/Sparsity.html#the-theta-logistic-example) gitbook
]

.pull-right[
<img src="data:image/png;base64,#static/state-space-dag.png" width="65%" style="display: block; margin: auto;" />

<img src="data:image/png;base64,#LMV_files/figure-html/unnamed-chunk-16-1.png" width="65%" style="display: block; margin: auto;" />
]
---
# Implementing Sparsity in TMB
<br>

.pull-left[
- Dependency on Eigen (C++ library) for sparse data types
- Automatically detects sparsity in the Hessian
- 'Optimizes' the computational graph based on sparsity patterns
- A sparse Cholesky factorization is used to evaluate the sparse Hessian
]

.pull-right[
**Bayesian Sparsity Comparison**<br>
Sparse spatial model run time (min)

|Sample Size|tmbstan|Stan|
|-----------|-------|----|
| 32        |5.9    |1072|
| 64        |7.5    |1716|
| 128       |8.7    |    |

]

---
# Implementing Sparsity in TMB
<br>
**Data Types**

- [**SparseMatrix**](https://eigen.tuxfamily.org/dox/group__TutorialSparse.html)
 - Declares `$\text{Eigen::SparseMatrix<Type>}$` from the Eigen sparse module

**TMB macro**

- [**DATA_SPARSE_MATRIX**](https://kaskr.github.io/adcomp/group__macros.html#ga75dbde3f78d762602b7ffc0f19a99e1e)
 - Get sparse matrix from R and declare it as `$\text{Eigen::SparseMatrix<Type>}$`
 
---
# Implementing Sparsity in TMB

- Classes are specified in the density namespace
- Note NEGATIVE log-likelihood is returned

**Density functions** <br>
[**AR1**](https://kaskr.github.io/adcomp/classdensity_1_1AR1__t.html) `$\quad\quad\quad\quad~~$` `$\text{class density::AR1_t<distribution>}$`
- Class to evaluate the negative log density of a (multivariate) AR1 process with parameter phi and given marginal distribution.
  
[**GMRF**](https://kaskr.github.io/adcomp/classdensity_1_1GMRF__t.html) `$\quad\quad\quad~~$` `$\text{class density::GMRF_t<distribution>}$`
- Class to evaluate the negative log density of a mean zero multivariate normal distribution with a sparse precision matrix

[**SEPARABLE**](https://kaskr.github.io/adcomp/classdensity_1_1SEPARABLE__t.html) `$\quad$` `$\text{density::SEPARABLE_t<distribution1, distribution2>}$`
- Take two densities f and g, and construct the density of their separable extension

---
# Applications in Fisheries

[WHAM](https://github.com/timjmiller/wham/blob/master/src/wham_v0.cpp) examples

[Multivariate random effect for mortality, age by year, using 2D AR1 density:](https://github.com/timjmiller/wham/blob/97577f1d619607ec40fdb8d3097f30668af16999/src/wham_v0.cpp#L483)

```cpp
PARAMETER_ARRAY(M_re); // random effects for year- and age-varying M deviations from mean M_a), dim = n_years x n_M_a
...
nll_M += SCALE(SEPARABLE(AR1(rho_M_a),AR1(rho_M_y)), Sigma_M)(M_re);
```

[Multivariate random effect for numbers at age, age by year, using 2D AR1 density:](https://github.com/timjmiller/wham/blob/97577f1d619607ec40fdb8d3097f30668af16999/src/wham_v0.cpp#L934)

```cpp
array<Type> NAA_devs(n_years_model+n_years_proj-1, n_ages);
...
nll_NAA += SEPARABLE(VECSCALE(AR1(NAA_rho_a), sigma_a_sig),AR1(NAA_rho_y))(NAA_devs);
```