class: center, middle, inverse, title-slide # TMB Checking Output and Validation ## FIMS Implementation Team Workshop ### Andrea Havron
NOAA Fisheries, OST
2022-03-23 --- layout: true .footnote[U.S. Department of Commerce | National Oceanic and Atmospheric Administration | National Marine Fisheries Service] <style type="text/css"> code.cpp{ font-size: 14px; } code.r{ font-size: 14px; } </style> --- # Debugging: Error types<br><br> .pull-left[ 1. Compile time errors 2. TMB-R linkage errors 3. Runtime errors 4. Convergence errors 5. Validation errors ] --- #Maturity at age model<br> The number of mature fish, `\(Y_{y,a}\)` out of the total number of fish, `\(N_{y,a}\)` at year, `\(y\)` and age, `\(a\)`, can be expressed as a binomial distribution, `$$Y_{y,a} \sim Binom(N_{y,a}, p_{y,a})$$`, where `\(p\)` represents the probability of maturity. Common to estimate maturity using logistic regression: `$$log\left(\frac{p_{y,a}}{1-p_{y,a}}\right) = \eta_{y,a} = \beta_0 + \beta_1 a$$` The age of 50% maturity is `\(a_{50} = -\frac{\beta_0}{\beta_1}\)`, --- #Overdispersed model<br> .pull-left[ ```cpp #include <TMB.hpp> template<class Type> Type dbetabinom(Type x, Type n, Type p, Type phi, int do_log) { Type ll = lgamma(n + 1.0) - lgamma(x + 1.0) - lgamma(n - x + 1.0) + lgamma(x + p*phi) + lgamma(n - x +(1-p)*phi) - lgamma(n + phi) + lgamma(phi) - lgamma(p*phi) - lgamma((1-p)*phi); if(do_log) return(ll); else { return(exp(ll)); } } ``` See * [maturity_bb.R](https://github.com/NOAA-FIMS/TMB_training/blob/main/maturity_examples/maturity_bb.R) * [maturity_bb.cpp](https://github.com/NOAA-FIMS/TMB_training/blob/main/maturity_examples/maturity_bb.cpp) ] .pull-right[ ```cpp template<class Type> Type objective_function<Type>::operator()() { DATA_VECTOR(Y); //number mature DATA_VECTOR(N); //number of mature + not mature DATA_VECTOR(age_obs); //age for each observation DATA_INTEGER(max_age); //to generate a maturity ogive PARAMETER_VECTOR(beta); //intercept, slope PARAMETER(log_phi); Type phi = exp(log_phi); int n_obs = Y.size(); vector<Type> nll(n_obs); nll.setZero(); vector<Type> logit_mat(n_obs), mat(n_obs); for(int i = 0; i < n_obs; i++) { logit_mat(i) = beta(0) + beta(1)*age_obs(i); mat(i) = 1/(1 + exp(-logit_mat(i))); nll(i) = -dbetabinom(Y(i), N(i), mat(i), phi, 1); //negative log-likelihood } return sum(nll); } ``` ] --- # Debugging: Error types<br> .pull-left[ 1. **Compile time errors** 2. TMB-R linkage errors 3. Runtime errors 4. Convergence errors 5. Validation errors ] .pull-right[ Fails with errors when compiling * Incorrect syntax: e.g., missing **)** or **;** * Wrong type, for example: * int instead of Type * passing matrix to a function that accepts a vector * Undeclared variable ] .pull-left[ linReg.cpp ```cpp DATA_VECTOR(y DATA_MATRIX(X); PARAMETER_VECTOR(beta); int nll; matrix<Type> mu = X*beta; nll = -dnorm(y,mu,sigma,true) return nll; ``` ] --- # Debugging: Error types<br> .pull-left[ 1. **Compile time errors** 2. TMB-R linkage errors 3. Runtime errors 4. Convergence errors 5. Validation errors ] .pull-right[ ```cpp DATA_VECTOR(y DATA_MATRIX(X); PARAMETER_VECTOR(beta); int nll; matrix<Type> mu = X*beta; nll = -dnorm(y,mu,sigma,true) return nll; ``` ] .pull-left[ linReg.R ```r #compilation flags when using gdbsource #On Linus/OS X TMB::compile("src/linReg.cpp","-O0 -g") #On Windows TMB::compile("src/linReg.cpp","-O1 -g", DLLFLAGS="") ``` ] .pull-right[ Console ```r #Debug using RStudio: > TMB:::setupRStudio() > TMB::compile("src/linReg.cpp") #Debug using gdbsource #hangs on Windows > TMB::gdbsource("R/linReg.R") #works on Windows but not always #informative > TMB::gdbsource("R/linReg.R", interactive = TRUE) ``` ] --- # Debugging: Error types<br> .pull-left[ 1. Compile time errors 2. **TMB-R linkage errors** 3. Runtime errors 4. Convergence errors 5. Validation errors ] .pull-right[ Compiles but MakeADFun crashes * .dll not loaded in R * Incorrect data/parameter types or missing value * Dimension mismatch ] .pull-left[ ```cpp DATA_VECTOR(y); DATA_MATRIX(X); PARAMETER_VECTOR(beta); PARAMETER(lnSigma); Type sigma = exp(lnSigma); Type nll = 0; vector<Type> mu = X*beta; nll -= sum(dnorm(y,mu,sigma,true)); return nll; ``` ] .pull-right[ linReg.R ```r library(TMB) compile(linReg.cpp) Data <- list(y = c(-2.1, 3.3, 4.2), X = data.frame(c(1,1),c(4,5)) Pars <- list(beta = 0) obj <- MakeADFun(data = Data, parameters = Pars, DLL = "linReg") ``` ] --- # Debugging: Error types<br> .pull-left[ 1. Compile time errors 2. TMB-R linkage errors 3. **Runtime errors** 4. Convergence errors 5. Validation errors ] .pull-right[ Model fails during minimization * Data-distribution mismatich * Incorrect parameter transformation * Not taking log of the likelihood * Illegal mathematical operations, eg. log(-1) ] .pull-left[ linReg.cpp ```cpp DATA_VECTOR(y); DATA_MATRIX(X); PARAMETER_VECTOR(beta); PARAMETER(sigma); Type nll = 0; vector<Type> mu = log(X*beta); nll -= sum(dlnorm(y,mu,sigma)); return nll; ``` ] .pull-right[ simple.R ```r library(TMB) compile(linReg.cpp) dyn.lib(dynload("linReg")) Data <- list(y = c(-2.1, 3.3, 4.2), X = cbind(c(1,1,1),c(4,5,2)) Pars <- list(beta = c(0,0), sigma = 0) obj <- MakeADFun(data = Data, parameters = Pars, DLL = "linReg") opt <- nlminb(obj$par, obj$fn, obj$gr) ``` ] --- # Debugging: Error types<br> .pull-left[ 1. Compile time errors 2. TMB-R linkage errors 3. Runtime errors 4. **Convergence errors** 5. Validation errors ] .pull-right[ Model compiles and runs but fails to converge * Singular convergence: model is likely overparameterized * False convergence: likelihood may be discontinuous * Relative convergence but Hessian not positive definite: singularity ] **Non-Invertible Hessians** * The Hessian will **not be invertible** if the MLE is not a true minimum * When could this occur? Usually mis-specified models * Parameters confounded or overparameterized (too complex for data) * TMB will warn about uninvertible Hessians (NaNs) --- # Debugging: Error types<br> .pull-left[ 1. Compile time errors 2. TMB-R linkage errors 3. Runtime errors 4. Convergence errors 5. **Validation errors** ] .pull-right[ Model compiles and runs and passes convergence tests but is incorrect * Model structure is mis-specified * Incorrect distribution for observations * Incorrect random effect structure ] --- # Implementing OSA in TMB<br> .pull-left[ ```cpp DATA_VECTOR(Y); //number mature DATA_VECTOR_INDICATOR(keep, Y); // for OSA residuals DATA_VECTOR(N); //number of mature + not mature DATA_VECTOR(age_obs); //age for each observation DATA_IVECTOR(re_ind); //indicator for random effect for each observation DATA_INTEGER(max_age); //to generate a maturity ogive PARAMETER_VECTOR(beta); //intercept, slope PARAMETER(log_phi); PARAMETER_VECTOR(AR_pars); //AR1 process (2 pars) PARAMETER_VECTOR(re); //annual ar1 process ... Type nll_re = SCALE(AR1(rho_re), sig_re)(re); //NEGATIVE log-likelihood returned.... ... for(int i = 0; i < n_obs; i++) { logit_mat(i) = beta(0) + beta(1)*age_obs(i) + re(re_ind(i)); mat(i) = 1/(1 + exp(-logit_mat(i))); nll(i) = -keep(i) * dbetabinom(Y(i), N(i), mat(i), phi, 1); //negative log-likelihood } ... return sum(nll) + nll_re; } ``` ] .pull-right[ ```r mod = MakeADFun(inputosa$data, inputosa$par, random = "re", DLL = "maturity_bb_re_osa") opt <- nlminb(mod$par, mod$fn, mod$gr) residuals = oneStepPredict(mod, observation.name = "Y", data.term.indicator="keep", method = "oneStepGeneric", discrete=TRUE) plot(input$data$age_obs, residuals$residual) qqnorm(residuals$residual) qqline(residuals$residual) ``` <img src="data:image/png;base64,#static/qqnorm.png" width="60%" style="display: block; margin: auto;" /> * See [maturity_bb_re_osa.cpp](https://github.com/NOAA-FIMS/TMB_training/blob/main/maturity_examples/maturity_bb_re_osa.cpp) * [maturity_bb_re.R](https://github.com/NOAA-FIMS/TMB_training/blob/main/maturity_examples/maturity_bb_re.R) ]