Chapter 12 Generalized Linear Models

12.1 Exercise 1

For this exercise, we will consider presence-absence (really detection, non-detection) observations of willow tits (Poecile montanus) during the Swiss Survey of Common Breeding Birds. The data can be accessed using:

library(Data4Ecologists)
data(willowtits)

These data are featured in a number of papers by Marc Kery and are also discussed Royle & Dorazio (2008). The Swiss breeding bird survey is conducted annually. Two or 3 times a year, volunteers walk a variable-length route on foot within 1 km\(^2\) plots. The data set contains counts c.i on each survey occasion (\(i\) = 1, 2, or 3), which have also been reduced to 0’s and 1’s (y.i; i=1, 2, or 3), with y.i = 1 if at least 1 bird was seen on survey \(i\) (and 0 otherwise). Usually, one would fit an occupancy or N-mixture model to these data (Royle & Dorazio, 2008), using the repeated counts or detection/non-detection data to inform detection parameters and therefore, adjust for imperfect detection. In this exercise, however, we will fit logistic regression models to a composite indicator variable, y, equal to 1 if at least 1 bird was seen on any survey and 0 otherwise. We will consider predictor variables measuring elevation (m) and percent forest cover associated with each surveyed plot.

  1. Fit a logistic regression model to the data (y) using glm. Include forest cover and elevation as predictor variables. Use poly(elev,2) to allow for a non-linear relationship between elevation and presence (really, present and detected).

  2. Write down a set of equations describing the model and identify estimates of the parameters.

  3. Fit the same model in JAGS and compare the estimated coefficients. Make sure to check for convergence (using Rhat), and also use denplot to examine posterior distributions for the model parameters. Hint: you can create the columns in the design matrix associated with orthogonal polynomials for elevation outside of JAGS using the poly function and then include these predictors in the data argument to jags. You can pass an \(n \times 2\) matrix of elevation values to jags using jags(data = list(elevation=poly(willowdat$elev,2), ...), inits=, ...). You can then refer to these data in the likelihood part of the JAGS model using elevation[i,1] (for the linear term) and elevation[i,2] (for the quadratic term).

  4. Use the model from [1] (fit using glm) to generate a plot of predicted probabilities of presence (really, present and detected) as a function of elevation (while holding forest at the mean level in the data set). Also, plot predicted probabilities of presence (really, present and detected) as a function of forest, while holding elevation at the mean level in the data set. You may use the effects package - or create the plots “from scratch” by making us of the predict function.

  5. Optional: create a similar effect plot using JAGS.

  1. Create a new data set representing the cases where you want to get predicted values. This gets a little tricky when using ns or poly. Specifically, you would need to use the same set of basis functions as were used to create the predictor variables in the original model fit (see examples at the bottom of the help page for poly). This step would require something like:
# First, generate the original polynomials, but also store the basis functions
elevation <- poly(wdat$elev, 2) 

# Now, use the information stored in elevation to create polynomials
# for new observations where we want predictions
new.elevation <- predict(elevation, seq(min(wdat$elev), max(wdat$elev), length = 100))

# We would also need to calculate the mean of forest 
m.forest <- rep(mean(wdat$forest), 100)
  1. Include m.forest and new.elevation in the data argument of JAGS.

  2. Construct the linear predictor for these new data and take the inverse logit of the linear predictor to get hat.p[i] (i.e., \(\hat{p}_i\)) for these new data.

  3. Confidence bands can then be plotted using the quantiles of the posterior of hat.p[i]

12.2 Exercise 2

For this exercise, you will compare results of fitting logistic regression models using Bayesian methods with different prior distributions for the regression coefficients. We will model the fate of passengers on the Titanic using predictors capturing economic status (passengerClass), sex and age. The data can be accessed using:

library(carData)
data("TitanicSurvival")
  1. Drop observations where age is missing, and then create a centered and scaled version of the age variable. Estimate parameters of a logistic regression model with sex, age (centered and scaled), and passengerClass included in the model. You are free to use either effect or means coding to model passengerClass. Use \(N(\mu = 0, \tau = 0.001)\) priors for the regression parameters.

  2. Repeat the analysis using \(N(\mu = 0, \tau = 1)\) priors for the regression parameters. Compare the posterior distributions for the two model fits using the MCMCplot function in the MCMCvis package (Youngflesh, 2018). Comment on any differences.

  3. Write down a set of equations for the logistic regression model you just fit and identify any parameters in the summary output. Interpret the coefficients of the fitted model.

  4. Calculate the odds of surviving for a first class passenger relative to a passenger in the 3rd class, holding age and sex constant. Also, calculate a 95% credible interval for this odds ratio. Use the model fit in step 1 to calculate these quantities. Interpret this odds ratio. Hint 1: the odds of a 1st class versus 3rd class passenger surviving = 1 / the odds of a 3rd class versus 1st class passenger surviving. Hint 2: to calculate the credible interval, you will want to access the MCMC samples returned by JAGS. One option is to use the MCMCchains function in the MCMCvis package. See ?MCMCchains to access the help page for this function.

  5. Use the model from step 1 to estimate the probability that a first class, male, with average age in the data set survived the sinking of the titanic.

12.3 Exercise 3

For this exercise, you will fit count-based regression models to location data from of 3605 individual trees of Beilschmiedia pendula in 50-ha (500 x 1000 m) forest plot in Barro Colorado (Panama). The data set is contained in the spatstat package (Baddeley & Turner, 2005; Baddeley, Rubak, & Turner, 2015). This exercise will build on a short tutorial put together by Petr Keil which can be located here. Use this template with code from the tutorial, which I have modified in a few ways. This code will run through Petr’s original data development and exploration steps. It will also fit a Poisson regression model using glm and JAGS. The JAGS code will produce both confidence intervals and prediction intervals and conduct a Bayesian goodness-of-fit test for the Poisson model.

  1. Run the code in the template. Comment on the appropriateness of the Poisson model in light of the goodness-of-fit test and the prediction intervals.

  2. Copy the code for fitting the Poisson model using glm and JAGS. Then, modify the copied glm code to fit a negative binomial model with elevation and elevation\(^2\) using the glm.nb function in the MASS package. Compare the negative binomial and Poisson regression models (fit using glm and glm.nb) using AIC.

  3. Modify the JAGS code for fitting the Poisson model to allow fitting of a negative binomial model in JAGS. Calculate predicted values and a Bayesian p-value to assess goodness-of-fit of the negative binomial model. Comment on the appropriateness of the negative binomial model in light of the goodness-of-fit test, AIC, estimate of \(\theta\), and the prediction intervals.

  4. Write down a set of equations describing the fitted negative binomial model. Match any parameters in the model to the estimates obtained from JAGS.

12.4 Exercise 4

This exercise will largely follow and build on an example put together by Jack Weiss for his ecological statistics course using data that I believe come from Kennedy, Marra, Fagan, & Neel (2010). You will compare bird species richness in different habitat patches sampled in Jamaica. The data can be accessed using:

library(Data4Ecologists)
data(birds)

One of the primary objectives of the study was to determine if species richness (S) differed between different landscape types (landscape = Agriculture, Bauxite, Forest, or Urban). Variable-sized habitat patches were surveyed in three consecutive years from 2005 through 2007. For now, we will ignore the repeated measures aspect of these data and assume all of the observations are mutually independent. We will revisit this assumption and consider models with random effects in a later exercise.

  1. Plot species richness (S) versus area of the habitat patch (area) with a separate color for each landscape type (landscape). Add a smooth line depicting the trend.

  2. Fit a Poisson regression model with landscape and year as predictors and log.area. included as an offset.

  3. Describe the model using text and equations and match the parameters from the R output to those in your model description.

  4. Fit a second model where log.area. is included as an additional predictor. Examine the estimated coefficient associated with log.area., and use the AIC to compare this model to the one in step 2. Which model would you choose and why?

  5. Fit a negative binomial model with landscape, year, and log.area. as predictors. Use MASS::glm.nb to fit the model. Compare this model to the one from setp 4 using AIC.

  6. Describe the negative binomial model using text and equations and match the parameters from the R output to those in your model description.

  7. So far, we have compared models using AIC, but the “AIC-best” model may still be a crappy model. Create residual plots for the negative binomial model using the residualPlots function in the car library.

  8. Fit a negative binomial with landscape and poly(log.area.,2)) (i.e., a quadratic polynomial for log.area.).

  9. Fit a Poisson model with landscape and poly(log.area.,2)). Examine residual plots for this model using residualPlots function.

  10. Summarize what you learned from steps 7-9. Consider the residual plots, the estimate of \(\theta\) from the model fit in step 8.

12.5 Exercise 5

This exercise will also consider the data from Exercise 4.

  1. Use JAGS to fit a Poisson regression model relating species richness (S) to landscape, log.area., and log.area.^2.

  2. Implement and interpret a Bayesian goodness-of-fit test for the model in step 1.

12.6 Exercise 6

Consider the data set birdmalariaLFS in the Data4Ecologists library. These data are associated with the following paper: Asghar, M., Hasselquist, D., Hansson, B., Zehtindjiev, P., Westerdahl, H., & Bensch, S. (2015). Hidden costs of infection: chronic malaria accelerates telomere degradation and senescence in wild birds. Science, 347(6220), 436-438.

The authors were interested in whether chronic infection might the impact fitness of wild great reed warblers. The data set contains the following variables:

  • LFS = lifetime reproductive success (number of offspring produced)
  • Sex of bird (0 if male? 1female)
  • year = year of birth
  • Tlifespan = total lifespan (in years)
  • infected = infected status (0 = uninfected thoughout life, 1 = infected at 1 year of life and remained infected throughout life)

The data set can be accessed using:

library(Data4Ecologists)
data("birdmalariaLFS")
  1. Fit Poisson and Negative Binomial models relating lifetime reproductive success to the sex of the bird, its birth year, total lifespan, and infected status. Allow the effect of total lifespan to differ for males and females. Also, assume the effect of birth year can be modeled using a continuous variable representing a trend over time.

  2. Write down a set of equations describing each model and identify the parameter estimates in the output from R.

  3. Using the negative binomial model, estimate the mean lifetime reproductive success for a female, born in 1991, that lives 5 years while infected with malaria.

References

Baddeley, A., Rubak, E., & Turner, R. (2015). Spatial point patterns: Methodology and applications with R. London: Chapman; Hall/CRC Press. Retrieved from https://www.routledge.com/Spatial-Point-Patterns-Methodology-and-Applications-with-R/Baddeley-Rubak-Turner/9781482210200/
Baddeley, A., & Turner, R. (2005). spatstat: An R package for analyzing spatial point patterns. Journal of Statistical Software, 12(6), 1–42. Retrieved from https://www.jstatsoft.org/v12/i06/
Kennedy, C. M., Marra, P. P., Fagan, W. F., & Neel, M. C. (2010). Landscape matrix and species traits mediate responses of neotropical resident birds to forest fragmentation in jamaica. Ecological Monographs, 80(4), 651–669.
Royle, J. A., & Dorazio, R. M. (2008). Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities. Elsevier.
Youngflesh, C. (2018). MCMCvis: Tools to visualize, manipulate, and summarize MCMC output. Journal of Open Source Software, 3(24), 640. doi:10.21105/joss.00640