Chapter 14 Linear Mixed Models

14.1 Exercise 1

This exercise closely follows by a problem in Kéry (2010) (exercise 1 of Chapter 12). He points out that there are only slight differences in how one specifies random effects and fixed effects in JAGS. When using fixed effects, each intercept (and or slope) is assigned a prior, usually a normal distribution centered on 0 with a small precision parameter. When using random effects, intercepts (and or slopes) are assigned the same prior, usually a normal distribution with a non-zero mean and precision parameter - these mean and precision parameters are in turn assigned hyperpriors.

  1. To help see the distinction between fixed and random effects modify the JAGS code used to fit the random intercepts model to the pines data set in Section 18 of the book (code below) so that it is equivalent to fitting the following fixed effects model:

\[\begin{gather} dbh_{ij} \sim N(\mu_i, \sigma^2) \\ \mu_i = \beta_{0j} + \beta_1 agec_{ij}, \end{gather}\]

Here, \(agec_{ij}\) is the (scaled and centered) age associated with the \(i^{th}\) tree at the \(j^{th}\) site and \(dbh_{ij}\) is the diameter at breast height for with the \(i_{th}\) tree at the \(j^{th}\) site. The model has 22 parameters:

  • \(\beta_{0j}\) (for \(j = 1, 2, \ldots, 20\)) are site-specific intercepts
  • \(\beta_1\) is the slope associated with agec
  • \(\sigma^2\) describes the variance of the errors
jags.lme.alt<-function(){   
  
  # Priors for the intercepts   
  for (i in 1:n.groups){        
    alpha[i] ~ dnorm(beta0, tau.b0)   # Random intercepts 
  } 
      
  # Hyper priors that describe the distribution of the site-specific intercepts   
  beta0 ~ dnorm(0, 0.001)      # Mean intercept
  tau.b0 <- 1 / (sigma.b0 * sigma.b0)    
  sigma.b0 ~ dunif(0, 100)     # SD of the random intercepts   
      
  # Priors for fixed effect regression parameters 
  beta1 ~ dnorm(0, 0.001)           # Common slope agec 
  tau.eps <- 1 / ( sigma.eps * sigma.eps)       # Residual precision    
  sigma.eps ~ dunif(0, 100)         # Residual standard deviation   
      
  # Likelihood  
  for (i in 1:nobs) {   
    dbh[i] ~ dnorm(mu[i], tau.eps)     # The random variable   
    mu[i] <- alpha[site[i]] + beta1*agec[i]  # Expectation   
  } 
  }     
  1. Compare your estimates from JAGS to those obtained when fitting an equivalent model using lm in R:

lm.fixed <- lm(dbh ~ agec + site - 1, data = pines)

14.2 Exercise 2

Mark Ditmer, a former post-doc at the University of Minnesota, along with collaborators at the UMN and MN DNR, used a combination of GPS collars and heart rate monitors to explore relationships between movement and heart rates of black bears in Minnesota. For this exercise, you will explore the bearmove data set in the Data4Ecologists package, which can be accessed using:

library(Data4Ecologists)
data(bearmove)

This data set consists of observations from 9 different ‘bear-years’ (unique combinations of bears and years). The goal will be to characterize how log(movement rate) and Season influence log(heart rate). We will fit models that include Season, log(movement rate), and the interaction between log(movement rate) and Season as predictor variables.

  1. Create a centered and scaled version of log.move and name this variable log.move.sc. Use the lm function to fit the following linear model log(hr) ~ Season*log.move.sc - under the naive assumption that ALL observations are independent. Interpret the coefficients in the fitted model.

  2. Calculate 95% confidence intervals for the coefficients using the confint function. Note: these intervals assume the observations are independent (a poor assumption).

  3. Assume that observations from different bears, and also observations from the same bear but taken in different years, are independent. Fit a random intercepts model to the data using lmer in the lme4 package: lmer(log(hr) ~ season*log.move.sc + (1 | BearIDYear), data = ) to specify a random intercept for each “bear-year.” Calculate 95% confidence intervals for the fixed effects parameters using confint(model.name, method="Wald"). These are asymptotic, normal-based intervals. Alternatively, we could use profile-based (method = "profile") or bootstrap-based (method = "boot") intervals. These latter methods are preferred, but they are computationally intensive and will sometimes fail.

  4. Fit a random coefficients model to the data using lmer(log(hr) ~ log.move.sc*Season + (log.move.sc*Season|BearIDYear), data= ). Calculate 95% confidence intervals for the fixed effects parameters using confint(model.name, method="Wald"). Note: if you try to fit this model using log.move rather than log.move.sc, lmer will fail to converge (centering and scaling can be really important when fitting random coefficient models!).

  5. Explore whether the assumptions of the mixed effects models in steps c) and d) are reasonably met. Specifically, consider residual versus fitted value plots and qqplots of the random effects and within-bear residuals. The check_model function in the performance library is a good choice for exploring potential assumption violations.

  6. Plot residuals from the model versus Julian date for each bear. One tip here - you can use the fortify.merMod function to create a data set containing all of the information in bearmove, along with fitted values and residuals from your fitted models. The augment function in the broom.mixed package will also do this for you. You should find that the within-individual residuals are highly autocorrelated, possibly a result of missing one or more important predictors. We could try to allow for a smooth effect of Julian day or allow the errors to be autocorelated. An alternative option for characterizing the population-level relationship between movement and heart rates would be to fit a simple linear model (as in step (a)) and then use a cluster-level bootstrap for inference. The advantage of this approach is that we do not have to make assumptions about normally distributed random effects or independent and normally distributed within-individual residuals. The disadvantage of this approach is that it will be conservative when estimating confidence intervals for coefficients associated with variables that are not constant within a cluster. Calculate SE’s and confidence intervals using a cluster-level bootstrap.

  7. Compare your confidence intervals from steps 2, 3, 4, and 6. Do these results make sense in light of the findings of Schielzeth & Forstmeier (2009) and also what we know about how the data were collected (i.e., that we observed the same animals repeatedly over time)?

Optional further exploration: There are other model formulations that we could consider in addition to the random intercept and random intercept and slope models fit in steps 2 and 3. You might consider the following models below:

  • lmer(log.hr ~ log.move.sc*Season + (log.move.sc | BearIDYear:Season), data = )

  • lmer(log.hr ~ log.move.sc*Season + (log.move.sc | BearIDYear) + (log.move.sc | BearIDYear:Season), data = )

Inspect the summary output for these models and reflect on:

  1. The induced dependence structure of the observations
  2. The number of variance parameters estimated by the model

Hint: One way to gain insights into the implications of these different formulations is to consider the structuring variables used to model the random effects, i.e., BearIDYear and BearIDYear:Season. Each unique value of these structuring variables leads to a random deviation from the overall mean parameter (i.e., mean intercept or mean slope). Observations that share a grouping variable will be correlated. Furthermore, when there are two grouping variables, we end up with varying levels of dependency; observations that share values for both grouping variables will be more correlated than observations that share only one of the two grouping variables.

14.3 Exercise 3

The length of telomeres8 decreases as individuals get older and this shortening has been associated with aging. Consider the birdmalariaTLdata set in the Data4Ecologists library. These data are associated with the following paper:

Asghar, M., Hasselquist, D., Hansson, B., Zehtindjiev, P., Westerdahl, H., & Bensch, S. (2015). Hidden costs of infection: chronic malaria accelerates telomere degradation and senescence in wild birds. Science, 347(6220), 436-438.

The data set contains the following variables:

  • LFS = lifetime reproductive success (number of offspring produced)
  • Sex of bird (0 if male and 1 if female)
  • year = year of birth
  • Age = age of the bird at the time of the observation
  • infected = infected status (0 = uninfected thoughout life, 1 = infected at 1 year of life and remained infected throughout life)
  • TL = telomere length

We can access the data accessed using:

library(Data4Ecologists)
data("birdmalariaTL")
  1. Fit a random intercept model to test whether the rate of decrease in telomere length as birds age differs between infected and non-infected birds after adjusting for any differences due to the sex of the individual. In other words, fit a model that contains Age, Sex, infected, and an interaction between Age and infected. Also, include a random intercept associated with each individual (Ind) in an attempt to deal with statistical dependencies resulting from having multiple measurements on each individual. Using the summary function, inspect the estimated parameters and make a conclusion based on this information. You might consider confidence intervals for model parameters or formal hypothesis tests (e.g., provided by the lmerTest package).

  2. Schielzeth & Forstmeier (2009) cautioned against using random intercept (only) models when estimating effects of predictors that vary within an individual. Let’s consider the different predictors in the model. The variables Sex and infected take on a single value for each individual. Therefore, they are not candidates for including random slopes. The variable Age, however, takes on multiple values for each individual. Fit a random intercept and slope model that allows the rate of decrease in telomere length as individuals age to vary across individuals. Do not make the Age:infected interaction random (this column of the design matrix will be 0 for all males and therefore will be constant for these individuals). Compare the two models in terms of:

    1. Estimates of the Age and Age:infected coefficients.
    2. Degrees of freedom associated with these coefficients when using the lmerTest package.
    3. Confidence intervals for the coefficients.

Comment on any differences. Are they what you would expect?

  1. Compare the models from steps 1 and 2 using AIC or a likelihood ratio test. Which model do you prefer and why?

  2. Write down the model using a set of equations and identify all of the model parameters in the R output created using the summary function.

  3. Use the check_model function to evaluate whether the model assumptions are reasonably met. Be sure to highlight any potential issues.

  4. Use the model from step 2 to predict the expected telomere length for an infected female that is 4 years old.

14.4 Exercise 4

Consider the data set birdmalariaLFS in the Data4Ecologists library. These data are associated with the following paper: Asghar, M., Hasselquist, D., Hansson, B., Zehtindjiev, P., Westerdahl, H., & Bensch, S. (2015). Hidden costs of infection: chronic malaria accelerates telomere degradation and senescence in wild birds. Science, 347(6220), 436-438.

The authors were interested in whether chronic infection might impact the fitness of wild great reed warblers. The data set contains the following variables:

  • ID = individual bird (offspring) identity at birth
  • brood = unique brood identifier (birds born to the same mother will be in the same brood)
  • broodyear = year of birth
  • Sex = Sex of the bird (0 if male, 1 if female)
  • offBTL = Offspring ealy-life telomere length (at 9 day age)
  • mmal = Mother malaria status at the time of breeding (1 if infected, 0 otherwise)
  • mage = Mother age at the time of breeding
  • dam = Unique identifier for each mother
  • Sire = Unique identifier for each father
  • fmal = Father malaria status at the time of breeding (1 if infected, 0 otherwise)
  • fage = Father age at time of breeding

The data set can be accessed using:

library(Data4Ecologists)
data("birdmalariaO")
  1. The authors state that they fit a linear mixed effect model with early-life telomere length of individual offspring [OffBTL] as the dependent variable, mother’s ID (dam) and brood ID (brood) as random factors, and brood year (broodyear), offspring sex (Sex), mother’s age (mage), mother’s malaria status (mmal) and the interaction mother’s age × mother’s malaria status (mage:mmal) as fixed factors. Write down a set of equations that describe this model. ( 4 pts)

  2. Fit the model using lmer and match the parameters in the equations to the estimates in the R output created using the summary function. (4 pts)

  3. Use the check_model function to evaluate whether the model assumptions are reasonably met. Be sure to highlight any potential issues. (4 pts)

  4. Estimate the expected telomere length in the population of females, born in 1988, from mothers that were 4 years old (mmage = 4) and infected with malaria (mmal = 1). (3 pts)

  5. Estimate the expected telomere length for an individual female bird born in 1988 in brood = 4 and dam = W28 when her mother was 4 years old (mage = 4) and infected with malaria (mmal = 1). (3 pts)

References

Kéry, M. (2010). Introduction to WinBUGS for ecologists: Bayesian approach to regression, ANOVA, mixed models and related analyses. Academic Press.
Schielzeth, H., & Forstmeier, W. (2009). Conclusions beyond support: Overconfident estimates in mixed models. Behavioral Ecology, 20(2), 416–420.

  1. A telomere is a region of repetitive DNA sequences at the end of a chromosome.↩︎