Chapter 14 Linear Mixed Models
14.1 Exercise 1
This exercise closely follows by a problem in Kéry (2010) (exercise 1 of Chapter 12). He points out that there are only slight differences in how one specifies random effects and fixed effects in JAGS. When using fixed effects, each intercept (and or slope) is assigned a prior, usually a normal distribution centered on 0 with a small precision parameter. When using random effects, intercepts (and or slopes) are assigned the same prior, usually a normal distribution with a non-zero mean and precision parameter - these mean and precision parameters are in turn assigned hyperpriors.
- To help see the distinction between fixed and random effects modify the JAGS code used to fit the random intercepts model to the
pines
data set in Section 18 of the book (code below) so that it is equivalent to fitting the following fixed effects model:
\[\begin{gather} dbh_{ij} \sim N(\mu_i, \sigma^2) \\ \mu_i = \beta_{0j} + \beta_1 agec_{ij}, \end{gather}\]
Here, \(agec_{ij}\) is the (scaled and centered) age associated with the \(i^{th}\) tree at the \(j^{th}\) site and \(dbh_{ij}\) is the diameter at breast height for with the \(i_{th}\) tree at the \(j^{th}\) site. The model has 22 parameters:
- \(\beta_{0j}\) (for \(j = 1, 2, \ldots, 20\)) are site-specific intercepts
- \(\beta_1\) is the slope associated with
agec
- \(\sigma^2\) describes the variance of the errors
jags.lme.alt<-function(){
# Priors for the intercepts
for (i in 1:n.groups){
alpha[i] ~ dnorm(beta0, tau.b0) # Random intercepts
}
# Hyper priors that describe the distribution of the site-specific intercepts
beta0 ~ dnorm(0, 0.001) # Mean intercept
tau.b0 <- 1 / (sigma.b0 * sigma.b0)
sigma.b0 ~ dunif(0, 100) # SD of the random intercepts
# Priors for fixed effect regression parameters
beta1 ~ dnorm(0, 0.001) # Common slope agec
tau.eps <- 1 / ( sigma.eps * sigma.eps) # Residual precision
sigma.eps ~ dunif(0, 100) # Residual standard deviation
# Likelihood
for (i in 1:nobs) {
dbh[i] ~ dnorm(mu[i], tau.eps) # The random variable
mu[i] <- alpha[site[i]] + beta1*agec[i] # Expectation
}
}
- Compare your estimates from JAGS to those obtained when fitting an equivalent model using
lm
in R:
lm.fixed <- lm(dbh ~ agec + site - 1, data = pines)
14.2 Exercise 2
Mark Ditmer, a former post-doc at the University of Minnesota, along with collaborators at the UMN and MN DNR, used a combination of GPS collars and heart rate monitors to explore relationships between movement and heart rates of black bears in Minnesota. For this exercise, you will explore the bearmove
data set in the Data4Ecologists
package, which can be accessed using:
This data set consists of observations from 9 different ‘bear-years’ (unique combinations of bears and years). The goal will be to characterize how log(movement rate) and Season influence log(heart rate). We will fit models that include Season, log(movement rate), and the interaction between log(movement rate) and Season as predictor variables.
Create a centered and scaled version of
log.move
and name this variablelog.move.sc
. Use thelm
function to fit the following linear modellog(hr) ~ Season*log.move.sc
- under the naive assumption that ALL observations are independent. Interpret the coefficients in the fitted model.Calculate 95% confidence intervals for the coefficients using the
confint
function. Note: these intervals assume the observations are independent (a poor assumption).Assume that observations from different bears, and also observations from the same bear but taken in different years, are independent. Fit a random intercepts model to the data using
lmer
in thelme4
package:lmer(log(hr) ~ season*log.move.sc + (1 | BearIDYear), data = )
to specify a random intercept for each “bear-year.” Calculate 95% confidence intervals for the fixed effects parameters usingconfint(model.name, method="Wald")
. These are asymptotic, normal-based intervals. Alternatively, we could use profile-based (method = "profile"
) or bootstrap-based (method = "boot"
) intervals. These latter methods are preferred, but they are computationally intensive and will sometimes fail.Fit a random coefficients model to the data using
lmer(log(hr) ~ log.move.sc*Season + (log.move.sc*Season|BearIDYear), data= )
. Calculate 95% confidence intervals for the fixed effects parameters usingconfint(model.name, method="Wald")
. Note: if you try to fit this model usinglog.move
rather thanlog.move.sc
,lmer
will fail to converge (centering and scaling can be really important when fitting random coefficient models!).Explore whether the assumptions of the mixed effects models in steps c) and d) are reasonably met. Specifically, consider residual versus fitted value plots and qqplots of the random effects and within-bear residuals. The
check_model
function in theperformance
library is a good choice for exploring potential assumption violations.Plot residuals from the model versus Julian date for each bear. One tip here - you can use the
fortify.merMod
function to create a data set containing all of the information inbearmove
, along with fitted values and residuals from your fitted models. Theaugment
function in thebroom.mixed
package will also do this for you. You should find that the within-individual residuals are highly autocorrelated, possibly a result of missing one or more important predictors. We could try to allow for a smooth effect of Julian day or allow the errors to be autocorelated. An alternative option for characterizing the population-level relationship between movement and heart rates would be to fit a simple linear model (as in step (a)) and then use a cluster-level bootstrap for inference. The advantage of this approach is that we do not have to make assumptions about normally distributed random effects or independent and normally distributed within-individual residuals. The disadvantage of this approach is that it will be conservative when estimating confidence intervals for coefficients associated with variables that are not constant within a cluster. Calculate SE’s and confidence intervals using a cluster-level bootstrap.Compare your confidence intervals from steps 2, 3, 4, and 6. Do these results make sense in light of the findings of Schielzeth & Forstmeier (2009) and also what we know about how the data were collected (i.e., that we observed the same animals repeatedly over time)?
Optional further exploration: There are other model formulations that we could consider in addition to the random intercept and random intercept and slope models fit in steps 2 and 3. You might consider the following models below:
lmer(log.hr ~ log.move.sc*Season + (log.move.sc | BearIDYear:Season), data = )
lmer(log.hr ~ log.move.sc*Season + (log.move.sc | BearIDYear) + (log.move.sc | BearIDYear:Season), data = )
Inspect the summary output for these models and reflect on:
- The induced dependence structure of the observations
- The number of variance parameters estimated by the model
Hint: One way to gain insights into the implications of these different formulations is to consider the structuring variables used to model the random effects, i.e., BearIDYear
and BearIDYear:Season
. Each unique value of these structuring variables leads to a random deviation from the overall mean parameter (i.e., mean intercept or mean slope). Observations that share a grouping variable will be correlated. Furthermore, when there are two grouping variables, we end up with varying levels of dependency; observations that share values for both grouping variables will be more correlated than observations that share only one of the two grouping variables.
14.3 Exercise 3
The length of telomeres8 decreases as individuals get older and this shortening has been associated with aging. Consider the birdmalariaTL
data set in the Data4Ecologists
library. These data are associated with the following paper:
Asghar, M., Hasselquist, D., Hansson, B., Zehtindjiev, P., Westerdahl, H., & Bensch, S. (2015). Hidden costs of infection: chronic malaria accelerates telomere degradation and senescence in wild birds. Science, 347(6220), 436-438.
The data set contains the following variables:
LFS
= lifetime reproductive success (number of offspring produced)Sex
of bird (0 if male and 1 if female)year
= year of birthAge
= age of the bird at the time of the observationinfected
= infected status (0 = uninfected thoughout life, 1 = infected at 1 year of life and remained infected throughout life)TL
= telomere length
We can access the data accessed using:
Fit a random intercept model to test whether the rate of decrease in telomere length as birds age differs between infected and non-infected birds after adjusting for any differences due to the sex of the individual. In other words, fit a model that contains
Age
,Sex
,infected
, and an interaction betweenAge
andinfected
. Also, include a random intercept associated with each individual (Ind
) in an attempt to deal with statistical dependencies resulting from having multiple measurements on each individual. Using thesummary
function, inspect the estimated parameters and make a conclusion based on this information. You might consider confidence intervals for model parameters or formal hypothesis tests (e.g., provided by thelmerTest
package).Schielzeth & Forstmeier (2009) cautioned against using random intercept (only) models when estimating effects of predictors that vary within an individual. Let’s consider the different predictors in the model. The variables
Sex
andinfected
take on a single value for each individual. Therefore, they are not candidates for including random slopes. The variableAge
, however, takes on multiple values for each individual. Fit a random intercept and slope model that allows the rate of decrease in telomere length as individuals age to vary across individuals. Do not make theAge:infected
interaction random (this column of the design matrix will be 0 for all males and therefore will be constant for these individuals). Compare the two models in terms of:- Estimates of the
Age
andAge:infected
coefficients. - Degrees of freedom associated with these coefficients when using the
lmerTest
package. - Confidence intervals for the coefficients.
- Estimates of the
Comment on any differences. Are they what you would expect?
Compare the models from steps 1 and 2 using AIC or a likelihood ratio test. Which model do you prefer and why?
Write down the model using a set of equations and identify all of the model parameters in the R output created using the
summary
function.Use the
check_model
function to evaluate whether the model assumptions are reasonably met. Be sure to highlight any potential issues.Use the model from step 2 to predict the expected telomere length for an infected female that is 4 years old.
14.4 Exercise 4
Consider the data set birdmalariaLFS
in the Data4Ecologists
library. These data are associated with the following paper:
Asghar, M., Hasselquist, D., Hansson, B., Zehtindjiev, P., Westerdahl, H., & Bensch, S. (2015). Hidden costs of infection: chronic malaria accelerates telomere degradation and senescence in wild birds. Science, 347(6220), 436-438.
The authors were interested in whether chronic infection might impact the fitness of wild great reed warblers. The data set contains the following variables:
ID
= individual bird (offspring) identity at birthbrood
= unique brood identifier (birds born to the same mother will be in the same brood)broodyear
= year of birthSex
= Sex of the bird (0 if male, 1 if female)offBTL
= Offspring ealy-life telomere length (at 9 day age)mmal
= Mother malaria status at the time of breeding (1 if infected, 0 otherwise)mage
= Mother age at the time of breedingdam
= Unique identifier for each motherSire
= Unique identifier for each fatherfmal
= Father malaria status at the time of breeding (1 if infected, 0 otherwise)fage
= Father age at time of breeding
The data set can be accessed using:
The authors state that they fit a linear mixed effect model with early-life telomere length of individual offspring [
OffBTL
] as the dependent variable, mother’s ID (dam
) and brood ID (brood
) as random factors, and brood year (broodyear
), offspring sex (Sex
), mother’s age (mage
), mother’s malaria status (mmal
) and the interaction mother’s age × mother’s malaria status (mage:mmal
) as fixed factors. Write down a set of equations that describe this model. ( 4 pts)Fit the model using
lmer
and match the parameters in the equations to the estimates in the R output created using thesummary
function. (4 pts)Use the
check_model
function to evaluate whether the model assumptions are reasonably met. Be sure to highlight any potential issues. (4 pts)Estimate the expected telomere length in the population of females, born in 1988, from mothers that were 4 years old (
mmage
= 4) and infected with malaria (mmal
= 1). (3 pts)Estimate the expected telomere length for an individual female bird born in 1988 in brood = 4 and dam = W28 when her mother was 4 years old (
mage
= 4) and infected with malaria (mmal
= 1). (3 pts)
References
A telomere is a region of repetitive DNA sequences at the end of a chromosome.↩︎