Chapter 12 Generalized Linear Models
12.1 Exercise 1
For this exercise, we will consider presence-absence (really detection, non-detection) observations of willow tits (Poecile montanus) during the Swiss Survey of Common Breeding Birds. The data can be accessed using:
These data are featured in a number of papers by Marc Kery and are also discussed Royle & Dorazio (2008). The Swiss breeding bird survey is conducted annually. Two or 3 times a year, volunteers walk a variable-length route on foot within 1 km\(^2\) plots. The data set contains counts c.i
on each survey occasion (\(i\) = 1, 2, or 3), which have also been reduced to 0’s and 1’s (y.i
; i=1, 2, or 3), with y.i
= 1 if at least 1 bird was seen on survey \(i\) (and 0 otherwise). Usually, one would fit an occupancy or N-mixture model to these data (Royle & Dorazio, 2008), using the repeated counts or detection/non-detection data to inform detection parameters and therefore, adjust for imperfect detection. In this exercise, however, we will fit logistic regression models to a composite indicator variable, y
, equal to 1 if at least 1 bird was seen on any survey and 0 otherwise. We will consider predictor variables measuring elevation (m) and percent forest cover associated with each surveyed plot.
Fit a logistic regression model to the data (
y
) usingglm
. Include forest cover and elevation as predictor variables. Usepoly(elev,2)
to allow for a non-linear relationship betweenelevation
and presence (really, present and detected).Write down a set of equations describing the model and identify estimates of the parameters.
Fit the same model in JAGS and compare the estimated coefficients. Make sure to check for convergence (using
Rhat
), and also usedenplot
to examine posterior distributions for the model parameters. Hint: you can create the columns in the design matrix associated with orthogonal polynomials for elevation outside of JAGS using thepoly
function and then include these predictors in thedata
argument tojags
. You can pass an \(n \times 2\) matrix of elevation values tojags
usingjags(data = list(elevation=poly(willowdat$elev,2), ...), inits=, ...)
. You can then refer to these data in the likelihood part of the JAGS model usingelevation[i,1]
(for the linear term) andelevation[i,2]
(for the quadratic term).Use the model from [1] (fit using
glm
) to generate a plot of predicted probabilities of presence (really, present and detected) as a function ofelevation
(while holdingforest
at the mean level in the data set). Also, plot predicted probabilities of presence (really, present and detected) as a function offorest
, while holdingelevation
at the mean level in the data set. You may use theeffects
package - or create the plots “from scratch” by making us of thepredict
function.Optional: create a similar effect plot using JAGS.
- Create a new data set representing the cases where you want to get predicted values. This gets a little tricky when using
ns
orpoly
. Specifically, you would need to use the same set of basis functions as were used to create the predictor variables in the original model fit (see examples at the bottom of the help page forpoly
). This step would require something like:
# First, generate the original polynomials, but also store the basis functions
elevation <- poly(wdat$elev, 2)
# Now, use the information stored in elevation to create polynomials
# for new observations where we want predictions
new.elevation <- predict(elevation, seq(min(wdat$elev), max(wdat$elev), length = 100))
# We would also need to calculate the mean of forest
m.forest <- rep(mean(wdat$forest), 100)
Include
m.forest
andnew.elevation
in thedata
argument of JAGS.Construct the linear predictor for these new data and take the inverse logit of the linear predictor to get
hat.p[i]
(i.e., \(\hat{p}_i\)) for these new data.Confidence bands can then be plotted using the quantiles of the posterior of
hat.p[i]
12.2 Exercise 2
For this exercise, you will compare results of fitting logistic regression models using Bayesian methods with different prior distributions for the regression coefficients. We will model the fate of passengers on the Titanic using predictors capturing economic status (passengerClass
), sex
and age
. The data can be accessed using:
Drop observations where
age
is missing, and then create a centered and scaled version of theage
variable. Estimate parameters of a logistic regression model withsex
,age
(centered and scaled), andpassengerClass
included in the model. You are free to use either effect or means coding to modelpassengerClass
. Use \(N(\mu = 0, \tau = 0.001)\) priors for the regression parameters.Repeat the analysis using \(N(\mu = 0, \tau = 1)\) priors for the regression parameters. Compare the posterior distributions for the two model fits using the
MCMCplot
function in theMCMCvis
package (Youngflesh, 2018). Comment on any differences.Write down a set of equations for the logistic regression model you just fit and identify any parameters in the summary output. Interpret the coefficients of the fitted model.
Calculate the odds of surviving for a first class passenger relative to a passenger in the 3rd class, holding
age
andsex
constant. Also, calculate a 95% credible interval for this odds ratio. Use the model fit in step 1 to calculate these quantities. Interpret this odds ratio. Hint 1: the odds of a 1st class versus 3rd class passenger surviving = 1 / the odds of a 3rd class versus 1st class passenger surviving. Hint 2: to calculate the credible interval, you will want to access the MCMC samples returned by JAGS. One option is to use theMCMCchains
function in theMCMCvis
package. See?MCMCchains
to access the help page for this function.Use the model from step 1 to estimate the probability that a first class, male, with average age in the data set survived the sinking of the titanic.
12.3 Exercise 3
For this exercise, you will fit count-based regression models to location data from of 3605 individual trees of Beilschmiedia pendula in 50-ha (500 x 1000 m) forest plot in Barro Colorado (Panama). The data set is contained in the spatstat
package (Baddeley & Turner, 2005; Baddeley, Rubak, & Turner, 2015). This exercise will build on a short tutorial put together by Petr Keil which can be located here. Use this template with code from the tutorial, which I have modified in a few ways. This code will run through Petr’s original data development and exploration steps. It will also fit a Poisson regression model using glm
and JAGS. The JAGS code will produce both confidence intervals and prediction intervals and conduct a Bayesian goodness-of-fit test for the Poisson model.
Run the code in the template. Comment on the appropriateness of the Poisson model in light of the goodness-of-fit test and the prediction intervals.
Copy the code for fitting the Poisson model using
glm
andJAGS
. Then, modify the copiedglm
code to fit a negative binomial model with elevation and elevation\(^2\) using theglm.nb
function in theMASS
package. Compare the negative binomial and Poisson regression models (fit usingglm
andglm.nb
) using AIC.Modify the JAGS code for fitting the Poisson model to allow fitting of a negative binomial model in JAGS. Calculate predicted values and a Bayesian p-value to assess goodness-of-fit of the negative binomial model. Comment on the appropriateness of the negative binomial model in light of the goodness-of-fit test, AIC, estimate of \(\theta\), and the prediction intervals.
Write down a set of equations describing the fitted negative binomial model. Match any parameters in the model to the estimates obtained from JAGS.
12.4 Exercise 4
This exercise will largely follow and build on an example put together by Jack Weiss for his ecological statistics course using data that I believe come from Kennedy, Marra, Fagan, & Neel (2010). You will compare bird species richness in different habitat patches sampled in Jamaica. The data can be accessed using:
One of the primary objectives of the study was to determine if species richness (S
) differed between different landscape types (landscape
= Agriculture, Bauxite, Forest, or Urban). Variable-sized habitat patches were surveyed in three consecutive years from 2005 through 2007. For now, we will ignore the repeated measures aspect of these data and assume all of the observations are mutually independent. We will revisit this assumption and consider models with random effects in a later exercise.
Plot species richness (
S
) versus area of the habitat patch (area
) with a separate color for each landscape type (landscape
). Add a smooth line depicting the trend.Fit a Poisson regression model with
landscape
andyear
as predictors andlog.area.
included as an offset.
Describe the model using text and equations and match the parameters from the R output to those in your model description.
Fit a second model where
log.area.
is included as an additional predictor. Examine the estimated coefficient associated withlog.area.
, and use the AIC to compare this model to the one in step 2. Which model would you choose and why?Fit a negative binomial model with
landscape
,year
, andlog.area.
as predictors. UseMASS::glm.nb
to fit the model. Compare this model to the one from setp 4 using AIC.Describe the negative binomial model using text and equations and match the parameters from the R output to those in your model description.
So far, we have compared models using AIC, but the “AIC-best” model may still be a crappy model. Create residual plots for the negative binomial model using the
residualPlots
function in the car library.Fit a negative binomial with
landscape
andpoly(log.area.,2))
(i.e., a quadratic polynomial forlog.area.
).Fit a Poisson model with
landscape
andpoly(log.area.,2))
. Examine residual plots for this model usingresidualPlots
function.Summarize what you learned from steps 7-9. Consider the residual plots, the estimate of \(\theta\) from the model fit in step 8.
12.5 Exercise 5
This exercise will also consider the data from Exercise 4.
Use JAGS to fit a Poisson regression model relating species richness (
S
) tolandscape
,log.area
., andlog.area.^2
.Implement and interpret a Bayesian goodness-of-fit test for the model in step 1.
12.6 Exercise 6
Consider the data set birdmalariaLFS
in the Data4Ecologists
library. These data are associated with the following paper:
Asghar, M., Hasselquist, D., Hansson, B., Zehtindjiev, P., Westerdahl, H., & Bensch, S. (2015). Hidden costs of infection: chronic malaria accelerates telomere degradation and senescence in wild birds. Science, 347(6220), 436-438.
The authors were interested in whether chronic infection might the impact fitness of wild great reed warblers. The data set contains the following variables:
LFS
= lifetime reproductive success (number of offspring produced)Sex
of bird (0 if male? 1female)year
= year of birthTlifespan
= total lifespan (in years)infected
= infected status (0 = uninfected thoughout life, 1 = infected at 1 year of life and remained infected throughout life)
The data set can be accessed using:
Fit Poisson and Negative Binomial models relating lifetime reproductive success to the sex of the bird, its birth year, total lifespan, and infected status. Allow the effect of total lifespan to differ for males and females. Also, assume the effect of birth year can be modeled using a continuous variable representing a trend over time.
Write down a set of equations describing each model and identify the parameter estimates in the output from R.
Using the negative binomial model, estimate the mean lifetime reproductive success for a female, born in 1991, that lives 5 years while infected with malaria.