class: center, middle, inverse, title-slide # Parameters and Simulation ## TMB Training Session I ### Andrea Havron
NOAA Fisheries, OST --- layout: true .footnote[U.S. Department of Commerce | National Oceanic and Atmospheric Administration | National Marine Fisheries Service] <style type="text/css"> code.cpp{ font-size: 14px; } code.r{ font-size: 14px; } </style> --- class: middle #Dealing with parameters --- #Parameter Transformations .pull-left[ Map parameters from real to parameter space * Minimizers search for MLE from `\(-\infty\)` to `\(+\infty\)` * Certain parameters have restricted support: * `\(\sigma^{2} > 0\)` ```cpp sigma = exp(ln_sigma) ``` * `\(0 < \pi < 1\)` ```cpp pi = 1/(1 + exp(-logit_pi)) # "expit" ``` ] .pull-right[ ![](data:image/png;base64,#02_Parameters_files/figure-html/unnamed-chunk-1-1.png)<!-- -->![](data:image/png;base64,#02_Parameters_files/figure-html/unnamed-chunk-1-2.png)<!-- --> ] --- #Expected Value Transformations <br> * Generalized Linear or Generalized Linear Mixed Models: model a function of the mean that is linear in X: `$$\text{GLM: }\eta = X\beta$$` `$$\text{GLMM: }\eta = X\beta + u$$` `$$E[Y] = g^{-1}(\eta)$$` * Distributions from the [**exponential family**](https://en.wikipedia.org/wiki/Exponential_family) have natural transformations and inverse link functions: `$$y \sim \text{Pois}(exp(\eta))$$` `$$y \sim \text{Binomial}(N, expit(\eta))$$` --- #Expected Value Transformations .three-column[ **Multinomial:** `$$p_{1}, ... p_{k}$$` `$$p_{k} = 1 - \sum^{k-1}_{i=1}p_{i}$$` ] .three-column[ Natural link function `\begin{bmatrix} log\frac{p_{1}}{1 - \sum^{k-1}_{i=1}p_{i}}\\ \vdots\\ log\frac{p_{k-1}}{1 - \sum^{k-1}_{i=1}p_{i}}\\ 0 \end{bmatrix}` ] .three-column[ Inverse Link Function `\begin{bmatrix} \frac{exp(\eta_{1})}{1 + \sum^{k-1}_{i=1}exp(\eta_{i})}\\ \vdots\\ \frac{exp(\eta_{k-1})}{1 + \sum^{k-1}_{i=1}exp(\eta_{i})}\\ \frac{1}{1 + \sum^{k-1}_{i=1}exp(\eta_{i})} \end{bmatrix}` ] <br> `\(\hspace{1.7in}\)`**Gamma:** `\begin{align} E[Y] &= \alpha\beta = exp(\eta)\\ Var[Y] &= \alpha\beta^{2}\\ cv^{2} &= \frac{\alpha\beta^{2}}{\alpha^{2}\beta^{2}} = \frac{1}{\alpha}\\ y &\sim \text{Gamma}(\frac{1}{cv^{2}}, exp(\eta)*cv^{2}) \end{align}` --- #Non-linear Regression [**Orange tree example**](https://github.com/kaskr/adcomp/blob/master/tmb_examples/orange_big.cpp): Expected size of tree, `\(i\)` at time `\(t\)` is a logistic function of age:<br> `$$\mu_{i,t} = \frac{a}{1 + exp(-(age_{i,t} - b)/c)}$$`<br> a = upper maximum size<br> b = age when circumference = 1/2a<br> c = inverse growth rate<br> `$$y_{i,t} \sim N(\mu_{i,t}, \sigma)$$` --- #Parameter mapping #### TMB allows users to collect and fix parameters using the map argument in MakeADFun() .large[.p[ * The parameter map is a list of parameters that are fixed or collected in a model run * Names and dimensions of parameters in the map list must match those of the parameter list * The map list is structured as a list of factors * Parameters with factor(NA) are fixed * Parameters with equal factor levels are collected to a common value ]] --- #Likelihood Ratio Tests .pull-left[ `\(H_{0}\)`: Null Hypothesis .p[ - `\(\theta \in \Theta_{0}\)` - parameter `\(\theta\)` is in a subset of a given parameter space, `\(\Theta_{0}\)` - restricted model ]] .pull-right[ `\(H_{1}\)`: Alternative Hypothesis .p[ - `\(\theta \in \Theta\)` - parameter `\(\theta\)` is in the complete parameter space, `\(\Theta\)` - unrestricted, full model ]] <br> **Likelihood Ratio Test**: `$$LR = -2[\ell(\theta_{0}) - \ell(\hat{\theta})]$$` `$$LR \sim \chi^{2}(df_{\chi^{2}} = df_{\Theta}-df_{\Theta_{0}})$$` --- # Parameter mapping with LR test <br> TMB example: [**lr_test.cpp**](https://github.com/kaskr/adcomp/blob/master/tmb_examples/lr_test.cpp) ```cpp / Illustrate map feature of TMB to perform likelihood ratio tests on a ragged array dataset. #include <TMB.hpp> template<class Type> Type objective_function<Type>::operator() () { DATA_VECTOR(obs); DATA_FACTOR(group); PARAMETER_VECTOR(mu); PARAMETER_VECTOR(sd); Type res=0; for(int i=0;i<obs.size();i++){ res -= dnorm(obs[i],mu[group[i]],sd[group[i]],true); } return res; } ``` --- class: middle # Simulation --- # Simulation in TMB * Standard generator functions to simulate data within the TMB model: ```cpp rnorm() rpois() runif() rbinom() rgamma() rexp() rbeta() rf() rlogis() rt() rweibull() rcompois() rtweedie() rnbinom() rnbinom2() ``` * Simulation blocks are used to call simulations from R .pull-left[ **Binomial Example**<br> C++ code ```cpp for(int i = 0; i < n_obs; i++){ eta(i) = beta(0) + beta(1)*x(i); p(i) = 1/(1 + exp(-eta(i))); nll(i) = -dbinom(Y(i), N(i), p(i), true); SIMULATE{ Y(i) = rbinom(N(i), p(i)); } } SIMULATE{ REPORT(Y); } ``` ] .pull-right[ <br> R code ```r set.seed(1) ## optional obj$simulate(complete = TRUE) ``` *Note* * optimization not necessary for simulation * not vectorized like in R ] --- #Simulation Study .pull-left[ linReg.cpp ```cpp for(int i=0; i<n; i++){ nll -= dnorm(y(i), mu(i), sigma, true); } SIMULATE{ y = rnorm(mu, sigma); REPORT(y); } ``` Set-up Model in R ```r library(TMB) dyn.load(dynlib('../src/linReg')) sig <- 2 beta <- c(0.5,2) Data <- list(y = rep(0,50), X = cbind(rep(1,50), 1:50)) Pars <- list(beta = c(0.5,2),lnSigma = log(2)) obj <- MakeADFun(data = Data, parameters = Pars, DLL = 'linReg') obj$par ``` ``` ## beta beta lnSigma ## 0.5000000 2.0000000 0.6931472 ``` ] .pull-right[ Run Simulation ```r set.seed(1) sim <- replicate(500, { simdata <- obj$simulate(par=obj$par, complete=TRUE) obj2 <- MakeADFun(simdata, Pars, DLL="linReg", silent=TRUE) nlminb(obj2$par, obj2$fn, obj2$gr)$par }) obj$par ``` ``` ## beta beta lnSigma ## 0.5000000 2.0000000 0.6931472 ``` ```r apply(t(sim), 2, mean) ``` ``` ## beta beta lnSigma ## 0.5143324 1.9995115 0.6649257 ``` ]