class: center, middle, inverse, title-slide # TMB 101 ## FIMS Implementation Team Workshop ### Andrea Havron
NOAA Fisheries, OST
2022-03-23 --- layout: true .footnote[U.S. Department of Commerce | National Oceanic and Atmospheric Administration | National Marine Fisheries Service] <style type="text/css"> code.cpp{ font-size: 14px; } code.r{ font-size: 14px; } </style> <!-- modified from Cole Monnahan's TMB training series: https://github.com/colemonnahan/tmb_workshop--> --- # What is TMB? <br><br> * TMB is an R package and environment for fitting statistical models * Models are written in C++ and run from R * TMB can implement: * Maximum Likelihood Inference * Random Effects Modeling * Bayesian Hierarchical Modeling * TMB performs automatic differentiation * Calculates accurate higher-order derivatives * Gradients (derivative vectors) are used for faster optimization --- class: middle # A refresher on Maximum Likelihood Inference --- # ML Inference What is the likelihood for 30 successes in 100 trials? .pull-left[ 1. Specify the model <br><br> `\(y ~ \sim Binomial(n, p)\)` ] --- # ML Inference What is the likelihood for 30 successes in 100 trials? .pull-left[ 1. Specify the model 2. Calculate the likelihood<br><br> `\(L(p; n, y) = \frac{n!}{y!(n-y)!}p^y(1-p)^{n-y}\)` ] .pull-right[ `\(L(p; n = 100, y = 30)\)` ![](data:image/png;base64,#TMB101_files/figure-html/unnamed-chunk-1-1.png)<!-- --> ] --- # ML Inference What is the likelihood for 30 successes in 100 trials? .pull-left[ 1. Specify the model 2. Calculate the likelihood 3. Calculate the negative log-likelihood<br><br> `\(-\ell(p; n, y) = -[ln\big(\frac{n!}{y!(n-y)!}\big) + yln(p)\)`<br> `$$+ (n-y)ln(1-p)]$$` ] .pull-right[ `\(-ln\big[L(p; n = 100, y = 30)\big]\)` ![](data:image/png;base64,#TMB101_files/figure-html/unnamed-chunk-2-1.png)<!-- --> ] --- # ML Inference What is the likelihood for 30 successes in 100 trials? .pull-left[ 1. Specify the model 2. Calculate the likelihood 3. Calculate the negative log-likelihood `\(-\ell(p; n, y) = -[ln\big(\frac{n!}{y!(n-y)!}\big) + yln(p)\)`<br> `$$+ (n-y)ln(1-p)]$$` 4. Calculate the derivative w.r.t. `\(p\)`<br><br> `\(\frac{d(\ell(p; n, y))}{dp} = \frac{y}{p}- \frac{n-y}{1-p}\)` ] .pull-right[ `\(-ln\big[L(p; n = 100, y = 30)\big]\)` ![](data:image/png;base64,#TMB101_files/figure-html/unnamed-chunk-3-1.png)<!-- --> ] --- # ML Inference What is the likelihood for 30 successes in 100 trials? .pull-left[ 1. Specify the model 2. Calculate the likelihood 3. Calculate the negative log-likelihood 4. Calculate the derivate wrt p 5. Set to 0 and solve for MLE<br><br> for n = 100 and y = 30, `\begin{align} 0 &= \frac{y}{p}- \frac{n-y}{1-p}\\ \hat{p} &= \frac{y}{n} \\ \hat{p} &= \frac{30}{100} = 0.3 \end{align}` ] .pull-right[ `\(-ln\big[L(p; N = 100, y = 30)\big]\)` ![](data:image/png;base64,#TMB101_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] --- class: middle #Uncertainty --- ##Asymptotic approximation with `\(f''\)` The second derivative measures the curvature of the likelihood and is approximately equal to the negative inverse of the variance when evaluated at the MLE Poisson Likelihood: `\(f(y) = \frac{e^{-\lambda}\lambda^{y}}{y!}\)`, for **y = 2**: .three-column[ log-likelihood: `\(\small \ell(\lambda) = -\lambda + ylog(\lambda) - log(y!)\)` ![](data:image/png;base64,#TMB101_files/figure-html/unnamed-chunk-5-1.png)<!-- --> ] .three-column[ 1st derivative: `\(\frac{d\ell(\lambda)}{d\lambda} = -n + \frac{y}{\lambda}\)` ![](data:image/png;base64,#TMB101_files/figure-html/unnamed-chunk-6-1.png)<!-- --> `\(\frac{\hat{y}}{n} = \lambda\)` ] .three-column[ 2nd Derivative: `\(\frac{d^{2}\ell(\lambda)}{d\lambda^{2}} = -\frac{y}{\lambda^{2}}\)` Evaluated at the MLE: `$$-\frac{n\lambda}{\lambda^{2}} = -\frac{n}{\lambda}$$` `$$Var(\lambda) = \frac{\lambda}{n}$$` ] --- #Multivariate asymptotics * For N-d models, the curvature is represented by a NxN **Hessian** matrix of 2nd partial derivatives * Inverting the negative Hessian gives us a covariance matrix `\begin{align} (\mathbb{H}_{f})_{i,j} &= \frac{\partial^2f}{\partial \theta_{i}, \partial x\theta_{j}} = \frac{-1}{Var(\Theta)} \end{align}` .three-column[ ![](data:image/png;base64,#static/mvn1.png) ] .three-column[<br><br>Which will have the smaller SE?] .three-column[ ![](data:image/png;base64,#static/mvn2.png) ] --- class: middle # Maximum Likelihood Inference with TMB --- # ML with TMB: <br><br> 1. Write C++ model to specify the negative log-likelihood for a set of parameters and data 2. Compile the model and link to R 3. Construct the C++ objective function with derivatives 3. Pass the following to a R minimizer: * objective function: specified by user in C++ * initial parameters and data: specified by user in R * gradient functions: calculated by TMB based on objective function 4. At each step of minimization: * Parameter values are updated * The negative log-likelihood is calculated * The gradient functions return the gradients (vector of derivatives) based on the new parameter values 5. Model convergence is reached when: * The gradients are near zero * The negative log-likelihood is at a minimum --- # TMB Model .pull-left[ linReg.cpp ```cpp // Simple linear regression #include <TMB.hpp> template <class Type> Type objective_function<Type>::operator()() { DATA_VECTOR(y); DATA_MATRIX(X); PARAMETER_VECTOR(beta); PARAMETER(lnSigma); Type nll = 0; Type sigma = exp(lnSigma); int n = y.size(); vector<Type> mu = X * beta; for(int i=0; i<n; i++){ nll -= dnorm(y(i), mu(i), sigma, true); } Type sig2 = pow(sigma,2); REPORT(sig2); ADREPORT(sig2); return nll; } ``` ] .pull-right[ **C++ preliminaries** * Lines end in semi-colons * Everything must be declared * Type is generic and assigned based on input * Indexing starts at 0! * x -= 1: x = x-1 * Math operators similar to R (+,-,/,*) * Use pow(x,p) for x^p * if statements cannot be based on a parameter ] --- # Data: Importing data from R<br><br> * Pass data to TMB with these 'macros' * Note: do not specify the object dimension <br><br> |TMB Syntax |C++ Type |R Type | |:---------------|:------------|:----------| |DATA_VECTOR(x) |vector<Type> |vector | |DATA_MATRIX(x) |matrix<Type> |matrix | |DATA_SCALAR(x) |Type |numeric(1) | |DATA_INTEGER(x) |int |integer(1) | |DATA_FACTOR(x) |vector<int> |factor | |DATA_ARRAY(x) |array<Type> |array | --- # Data: Importing data from R<br><br> .pull-left-narrow[ TMB code ```cpp DATA_VECTOR(y); DATA_MATRIX(X); DATA_INTEGER(i); DATA_FACTOR(ngroup); ``` ] .pull-right-wide[ R script ```r Data <- list( y = c(30.2, 45.3, 12.1), X = matrix(0,3,3), i = 11, ngroup1 = c(1,1,2,2,2) ) str(Data) ``` ``` ## List of 4 ## $ y : num [1:3] 30.2 45.3 12.1 ## $ X : num [1:3, 1:3] 0 0 0 0 0 0 0 0 0 ## $ i : num 11 ## $ ngroup1: num [1:5] 1 1 2 2 2 ``` ] --- # Declaring model parameters<br><br> * No PARAMETER_INTEGER * Again, do not specify the object dimension <br><br> |TMB Syntax |C++ Type |R Type | |:-------------------|:------------|:----------| |PARAMETER_VECTOR(x) |vector<Type> |vector | |PARAMETER_MATRIX(x) |matrix<Type> |matrix | |PARAMETER_ARRAY(x) |array<Type> |array | |PARAMETER(x) |Type |numeric(1) | --- # Declaring model parameters<br><br> .pull-left-narrow[ TMB code ```cpp PARAMETER_VECTOR(beta); PARAMETER(ln_sigma); PARAMETER_MATRIX(u); ``` ] .pull-right-wide[ R script ```r Pars <- list( beta = c(0,0), lnSigma = 0, u = matrix(0,3,3) ) str(Pars) ``` ``` ## List of 3 ## $ beta : num [1:2] 0 0 ## $ lnSigma: num 0 ## $ u : num [1:3, 1:3] 0 0 0 0 0 0 0 0 0 ``` ] --- #Building a TMB model in R<br><br> * All C++ models, including TMB, need to be compiled * In TMB, this creates a dynamic link library (.dll) * The .dll file needs to be loaded for R to access ```r library(TMB) compile('../src/linReg.cpp') ``` ``` ## [1] 0 ``` ```r dyn.load(dynlib('../src/linReg')) ``` --- #Building a TMB model in R<br> * TMB additionally needs to build a computational graph * This sets up derivative chain used to determine the gradiant functions * TMB has a static graph, so this step happens before optimization * The R function, MakeADFun() completes this step ```r #Simulate data set.seed(123) Data <- list(y = rnorm(10) + 1:10, X=cbind(rep(1,10),1:10)) obj <- MakeADFun(data = Data, parameters = list(beta = c(0,0),lnSigma = 0), DLL = 'linReg') ``` MakeADFun output: * **par**: initial values * **fn**: the likelihood function * **gr**: the gradient function * **report**: return values wrapped in REPORT() * **env**: environment with access to all parts of structure --- #Building a TMB model in R<br> * The model is optimized outside of TMB using any R minimizer * nlminb is a base R minimizer frequently used for TMB models ```r # Pass initial values, function, and gradient functions opt <- nlminb(obj$par, obj$fn, obj$gr) ``` Report objects back to R ```r report <- obj$report() report$sig2 ``` ``` ## [1] 0.7632995 ``` nlminb output: * **par**: MLEs * **objective**: the nll associated with the MLEs * **convergence**: 0 indicates successful convergence * **message**: convergence message * **iterations**: number of iterations * **evaluations**: number of objective and gradient function evaluations --- # Check model convergence ```r #Check maximum gradient < 0.0001 obj$gr() ``` ``` ## outer mgc: 1.38528e-07 ``` ``` ## [,1] [,2] [,3] ## [1,] -3.721396e-08 -7.253913e-08 -1.38528e-07 ``` ```r #Check convergence error = 0 opt$convergence ``` ``` ## [1] 0 ``` ```r #Check convergence message opt$message ``` ``` ## [1] "relative convergence (4)" ``` --- # Uncertainty ```r sdr <- sdreport(obj) ``` ```r #uses the Hessian to calculate parameter uncertainty summary(sdr, "fixed") ``` ``` ## Estimate Std. Error ## beta 0.5254674 0.59683034 ## beta 0.9180288 0.09618792 ## lnSigma -0.1350524 0.22360672 ``` ```r #uses the Delta method to calculate undertainty for derived quantities summary(sdr, "report") ``` ``` ## Estimate Std. Error ## sig2 0.7632995 0.3413578 ``` ```r #Check Hessian is positive definite sdr$pdHess ``` ``` ## [1] TRUE ``` --- # Debugging: Error types<br> .pull-left[ 1. **Compile time errors** 2. TMB-R linkage errors 3. Runtime errors 4. Convergence errors ] .pull-right[ Fails with errors when compiling * Incorrect syntax: e.g., missing **)** or **;** * Wrong type, for example: * int instead of Type * passing matrix to a function that accepts a vector * Undeclared variable ] ```cpp DATA_VECTOR(y DATA_MATRIX(X); PARAMETER_VECTOR(beta); int nll; matrix<Type> mu = X*beta; nll = -dnorm(y,mu,sigma,true) return nll; ``` --- # Debugging: Error types<br> .pull-left[ 1. Compile time errors 2. **TMB-R linkage errors** 3. Runtime errors 4. Convergence errors ] .pull-right[ Compiles but MakeADFun crashes * .dll not loaded in R * Incorrect data/parameter types or missing value * Dimension mismatch ] .pull-left[ linReg.cpp ```cpp DATA_VECTOR(y); DATA_MATRIX(X); PARAMETER_VECTOR(beta); PARAMETER(lnSigma); Type sigma = exp(lnSigma); Type nll = 0; vector<Type> mu = X*beta; nll -= sum(dnorm(y,mu,sigma,true)); return nll; ``` ] .pull-right[ linReg.R ```r library(TMB) compile(linReg.cpp) Data <- list(y = c(-2.1, 3.3, 4.2), X = data.frame(c(1,1),c(4,5)) Pars <- list(beta = 0) obj <- MakeADFun(data = Data, parameters = Pars, DLL = "linReg") ``` ] --- # Debugging: Error types<br> .pull-left[ 1. Compile time errors 2. TMB-R linkage errors 3. **Runtime errors** 4. Convergence errors ] .pull-right[ Model fails during minimization * Data-distribution mismatich * Incorrect parameter transformation * Not taking log of the likelihood * Illegal mathematical operations, eg. log(-1) ] .pull-left[ linReg.cpp ```cpp DATA_VECTOR(y); DATA_MATRIX(X); PARAMETER_VECTOR(beta); PARAMETER(sigma); Type nll = 0; vector<Type> mu = log(X*beta); nll -= sum(dlnorm(y,mu,sigma)); return nll; ``` ] .pull-right[ simple.R ```r library(TMB) compile(linReg.cpp) dyn.lib(dynload("linReg")) Data <- list(y = c(-2.1, 3.3, 4.2), X = cbind(c(1,1,1),c(4,5,2)) Pars <- list(beta = c(0,0), sigma = 0) obj <- MakeADFun(data = Data, parameters = Pars, DLL = "linReg") opt <- nlminb(obj$par, obj$fn, obj$gr) ``` ] --- # Debugging: Error types<br> .pull-left[ 1. Compile time errors 2. TMB-R linkage errors 3. Runtime errors 4. **Convergence errors** ] .pull-right[ Model compiles and runs but fails to converge * Singular convergence: model is likely overparameterized * False convergence: likelihood may be discontinuous * Relative convergence but Hessian not positive definite: singularity ] --- # Non-Invertible Hessians<br><br> * The Hessian will **not be invertible** if the MLE is not a true minimum * When could this occur? Usually mis-specified models * Parameters confounded or overparameterized (too complex for data) * Confounded model example: ```r x1 <- rnorm(50); x2 <- x1; lm(y~x1+x2) ``` * Why confounded? lm estimates slope of x2 as NA * TMB will warn about uninvertible Hessians (NaNs) --- class: middle # Hierarchical modeling with TMB --- #The Hierarchical model<br><br><br> `$$\Large \int_{\mathbb{R}}f(y;u,\theta)f(u;\theta)du$$` --- # The Laplace approximation<br> The marginal likelihood of the data after integrating out random effects, `\(u\)`: `$$L(\theta) = \int_{\mathbb{R}}f(y;u,\theta)f(u;\theta)du$$` 1. The random effects likelihood is minimized to find `\(\hat{u}\)`: `$$\hat{u}(\theta) = \underset{u}{argmin}f(u,\theta)$$` 2. The Hessian (matrix of partial 2nd derivatives) of `\(f(u,\theta)\)` w.r.t. `\(u\)` is evaluated at `\(\hat{u}(\theta)\)`: `$$\mathbb{H}(\theta) = f^{"}_{uu}(\hat{u}(\theta), \theta)$$` 3. The Laplace approximation for the marginal likelihood is: `$$L^{*}(\theta) = \sqrt{2\pi}^{n}det(\mathbb{H})^{-1/2}f(y,\hat{u}, \theta)$$` --- #Implementing random effects in TMB Random effect group term: `$$u \sim N(0, \sigma_{u})$$` Observations: `\begin{align} \mu &= X\beta + Bu\\ y &\sim N(0,\sigma_{y}) \end{align}` --- #Implementing RE in TMB .pull-left[ ```cpp #include <TMB.hpp> template<class Type> Type objective_function<Type>::operator()() { DATA_VECTOR(y); DATA_SPARSE_MATRIX(B); DATA_SPARSE_MATRIX(X); PARAMETER_VECTOR(u); PARAMETER_VECTOR(beta); PARAMETER(lnSDu); PARAMETER(lnSDy); Type nll = 0; Type sdu = exp(lnSDu); nll -= dnorm(u, Type(0), sdu, true).sum(); vector<Type> mu = X * beta + B * u; Type sdy = exp(lnSDy); nll -= dnorm(y, mu, sdy, true).sum(); return ans; } ``` ] .pull-right[ Simulate Data: ```r set.seed(123) yr <- rep(1900:2010,each=2) year <- factor(yr) quarter <- factor( rep(1:4,length.out=length(year)) ) period <- factor((yr > mean(yr))+1) ## Random year+quarter effect, ## fixed period effect: B <- model.matrix(~year+quarter-1) X <- model.matrix(~period-1) B <- as(B,"dgTMatrix") X <- as(X,"dgTMatrix") u <- rnorm(ncol(B)) ## logsdu=0 beta <- rnorm(ncol(X))*100 eps <- rnorm(nrow(B),sd=1) ## logsd0=0 y <- as.numeric( X %*% beta + B %*% u + eps ) ``` ] --- #Implementing RE in TMB .pull-left[ ```cpp #include <TMB.hpp> template<class Type> Type objective_function<Type>::operator()() { DATA_VECTOR(y); DATA_SPARSE_MATRIX(B); DATA_SPARSE_MATRIX(X); PARAMETER_VECTOR(u); PARAMETER_VECTOR(beta); PARAMETER(lnSDu); PARAMETER(lnSDy); Type nll = 0; Type sdu = exp(lnSDu); nll -= dnorm(u, Type(0), sdu, true).sum(); vector<Type> mu = X * beta + B * u; Type sdy = exp(lnSDy); nll -= dnorm(y, mu, sdy, true).sum(); return ans; } ``` ] .pull-right[ Run Model: ```r library(TMB) #compile("src/simple_re.cpp") dyn.load(dynlib("../src/simple_re")) obj <- MakeADFun( data=list(y=y, B=B, X=X), parameters=list(u=u*0, beta=beta*0, lnSDu=1, lnSDy=1), random="u", DLL="simple_re", silent=TRUE ) opt <- nlminb(obj$par, obj$fn, obj$gr) c(round(beta,2),0,0) ``` ``` ## [1] 51.94 30.12 0.00 0.00 ``` ```r round(opt$par,2) ``` ``` ## beta beta lnSDu lnSDy ## 52.01 30.24 -0.16 0.03 ``` ]