Chapter 11 Bayesian Inference

Note: when creating solutions, I recommend adding progress.bar="gui" (Windows only) or progress.bar="none" (MAC users, or Windows) as an argument to jags (this will cut back on unnecessary output).

11.1 Exercise 1

Consider the jaw length data for male and female golden jackals (Canis aureus) from Section 3.5 of the book.

Refit the simple “t-test” model using JAGS (as show in Section 12.4 of the book). Then, modify it so that males and females have different variance parameters.
Use graphical methods to compare posterior distributions for the difference in means, $μ_{f} - μ_{m}$ , from the original model (with equal variances) and the model allowing for separate variance parameters. Also compare 95% Bayesian credibility intervals for $μ_{f} - μ_{m}$ using both models. Are your conclusions robust to the equal variance assumption? I.e., do your conclusions change much when you use the more complex model?
Compare posterior distributions for $σ_{m}$ and $σ_{f}$ from the second (heterogeneous variance) model. Comment on any differences.
Compare the prior and posterior distributions for $σ_{m}$ (the standard deviation for male jaw lengths). How has the data informed us about this parameter?

11.2 Exercise 2

This exercise will have you compare Frequentist confidence intervals and Bayesian credible intervals using simulated data.

Simulate data where all of the linear model assumptions are reasonable:
1. Generate 100 random $x$ values drawn from a uniform (-5,5) distribution (hint: use runif).
2. Generate 100 random errors, $ϵ_{i}$ , drawn from a $N (μ = 0, σ = 4)$ distribution.
3. Let $Y_{i} = β_{0} + x_{i} β_{1} + ϵ_{i}$ ; with $β_{0} = 6$ and $β_{1} = 3$ .
Estimate $β_{0}$ and $β_{1}$ using linear regression and save the coefficient estimates and 95% confidence intervals.
Write code to estimate $β_{0}$ and $β_{1}$ in a Bayesian framework. Hint: consider $N (μ = 0, τ = 0.001)$ priors for the regression parameters; use a uniform prior for $σ$ and then calculate the prior for $τ$ from this uniform prior (similar to the model we fit to the jaw length data)
Use a for loop to repeat the process of simulating data and estimating parameters (and intervals) at least 10 times, keeping track of point estimates for $β_{0}$ and $β_{1}$ and their 95% Confidence and 95% Credible Intervals.
Compare the Frequentist and Bayesian estimates of $β_{0}$ and $β_{1}$ . Also, compare the widths of the 95% CIs.

Helpful Hint 1: when writing your code for this problem, it is useful to consider which parts of the code need to be inside the loop and which parts can be written outside of the loop. Anything that needs to be updated with each iteration needs to be included in the loop (so, code that generates data, passes data to JAGS, or fits a model to the simulated data needs to be contained within the loop). Other code that defines constants or objects that do not change with each iteration can be contained outside of the loop (e.g., true coefficients used to simulate the data, the strucure of the JAGS model).

Helpful Hint 2: although you could use two separate for loops (one for generating and estimating parameters using lm and a second for generating and estimating parameters using JAGS), it is better to fit models use the same data for both lm and JAGS. Why? Each time you simulate a new data set, you will get a different set of parameters. When you use the same data with both lm and JAGS, you control for this variability. The benefits are similar to collecting paired data, which can be analyzed using a paired t-test versus two separate samples analyzed using a t-test for a difference in means.

11.3 Exercise 3

This exercise was inspired by an example presented in Chapter 11 of Kéry (2010). Kery simulates length and body mass data for Asp vipers (Vipera aspis) in 3 different regions of Switzerland. He then shows how easy it is to get misleading results if careful attention is not given to the assumed prior distributions. He suggests centering and scaling predictor variables as a solution to the problem. Here, instead, we will explore whether we can “fix” the problem using more dispersed priors.

Figure 11.1: Asp Viper (Vipera aspis) female (found by Jean Nicolas). Author: Bernard Dupont from France. https://creativecommons.org/licenses/by-sa/2.0/

Simulate data using Kery’s code, below:

n.groups <- 3 # number of groups
n.sample <- 10 # number of individuals per group
n <- n.groups * n.sample  # Total number of data points
x <- rep(1:n.groups, rep(n.sample, n.groups)) # Indicator for population
pop <- factor(x, labels = c("Pyrenees", "Massif Central", "Jura"))
length <- runif(n, 45, 70)      # Obs. body length (cm) is rarely less than 45

Xmat <- model.matrix(~ pop*length)
beta.vec <- c(-250, 150, 200, 6, -3, -4)

lin.pred <- Xmat[,] %*% beta.vec    # Value of lin.predictor
eps <- rnorm(n = n, mean = 0, sd = 10)  # residuals 
mass <- lin.pred + eps          # response = lin.pred + residual
vip.dat<-data.frame(length = c(length[1:10], length[11:20], length[21:30]), 
                    mass = c(mass[1:10], mass[11:20],mass[21:30]), 
                    pop = pop )

Plot the data using:

ggplot(vip.dat, aes(x = length, y = mass, colour = pop)) + 
  geom_point() + geom_smooth(method = "lm")

The parameters used to simulate the data, (-250, 150, 200, 6, -3, -4), correspond to a model formulated using effects coding. Fit a linear model in R using effects coding (the default) and show that you can recover the parameters used to generate the data.
Fit the same model in R, but using means coding.
Fit the same model using means coding in JAGS. Use $N o r m a l (μ = 0, τ = 0.001)$ priors for the intercept and slope parameters. Inspect the Rhat values and verify that you ran enough simulations to ensure convergence. Compare your estimates to those obtained in step 4.
Compare the assumed prior distribution for the intercept and slope parameters to the posterior distributions returned by JAGS. Comment on your findings. You can plot the prior distribution using:

curve(dnorm(x, mean = 0, sd = 1/sqrt(0.001)), from =-300, to = 300)

Consider more dispersed priors for the intercept and slope parameters, specifically $N o r m a l (μ = 0, τ = 0.00001)$ . Plot this prior distribution. Refit the model after updating the prior distributions for the intercept and slope parameters to this more dispersed prior. Again, compare you estimates to those from step 4.
Comment on what you learned from this exercise.

11.4 Exercise 4

This exercise was inspired by and largely follows an exercise in Chapter 4 of Kéry (2010). Save a copy of your code for fitting the simple “t-test” model to the jaw length data set in Exercise 1. Open the copy, and explore what can go wrong. Summarize your findings in a short paragraph or two. Note, if you want to create a report showing your errors, you can add the following to ensure R will compile a report using knitr even if you have errors:

knitr::opts_chunk$set(
  error = TRUE # do not interrupt in case of errors
)

Create a typo for one of the node names when you define your JAGS model (e.g., mu.female). What happens?
Select extremely diffuse priors (e.g., mu.male ~ rnorm(0,0.00000001)). Does this cause any problems?
Select an initial value for $σ$ outside the range specified by the uniform prior. What happens?
“Forget” to add nmales in the data = section of the call to JAGS.
Add an observation for a male and a female to the data set with a value = NA (missing). Change the values of nmales and nfemales (to reflect the change [i.e., increase their values by 1]). Add females and males to the parameters.to.save so you can monitor this last observation. The model will impute values for the missing observation. Look at density plots for this observation (e.g., using functions in the mcmcplots package).

11.5 Exercise 5

In this exercise, you will compare frequentist confidence intervals for $E [Y | X]$ and prediction intervals for $Y_{i} | X_{i}$ to Bayesian intervals when fitting linear models. We will again consider the LizardBite data set in the abd library (Middleton & Pruim, 2015). This data set was collected by Lappin & Husak (2005) and featured in a problem in Whitlock & Schluter (2015)’s popular introductory statistics book. Lappin & Husak (2005) was interested in whether the bite force (bite) of male lizards in the species Crotaphyutus collaris was predictive of their territory size (territory). You can access the data using:

library(abd)
data(LizardBite)

Fit a linear model using R that could be used to predict territory size from a lizard’s bite force.
Fit the same model using JAGS.
Compare confidence and prediction intervals using the predict function with lm to those formed using samples from the posterior distribution using JAGS.

11.6 Exercise 6

Reconsider the moose detection example from Section 11 of the book. As in the book, assume that the number of detections, $y = 59$ follows a Binomial distribution with $n = 124$ and probability of detection, $p$ . In the book, we considered a Beta(1, 1) prior distribution (equivalent to a uniform distribution) for $p$ , which gave us a posterior distribution for $p$ that was Beta(60, 66). More generally, if we consider a Beta( $α, β$ ) prior for $p$ , we end up with a posterior distribution that is Beta( $α^{'}, β^{'}$ ) with $α^{'} = α + y$ , $β^{'} = β + n - y$ .

For this exercise, consider the following priors for $p$ :

Beta(0.1, 0.4)
Beta(2, 2)
Beta(50, 50)
Beta(0.7, 0.09)

For each of the above prior distributions:

Plot the prior and posterior distribution for $p$ .
Estimate the mean of the posterior distribution, $\hat{p} = α^{'} / (α^{'} + β^{'})$ and a 95% credible interval for $p$ using the qbeta function in R.

Now, consider a second data set containing the number of tails, $y = 10$ , resulting from $n = 30$ coin flips. Repeat steps 1 and 2 above but with these new data.
Comment on your findings? How important is choosing an appropriate prior distribution when sample size is large versus small? Which prior distribution do you prefer to use in step 3?

11.7 Exercise 7

Harder and Thompson (1989)⁶ quantified the amount of pollen removed by bumblebee queens and honeybee workers as a function of the time they spent at flowers. These data can be accessed using:

library(Data4Ecologists)
data(beepollen)

The data set has three variables:

removed = proportion of the pollen removed
duration = duration of visit (in seconds)
queen = 1 if the been was a queen bee and 0 otherwise

Use JAGS to fit the following model:

$\begin{matrix} r e m o v e d_{i} \sim N (μ_{i}, σ^{2}) \\ μ_{i} = (β_{1} + β_{2} I (q u e e n = 1)_{i}) * (1 - e x p (- β_{3} * d u r a t i o n_{i})) \end{matrix}$

Note: the mean response pattern will asymptote at either $β_{1}$ (for worker bees) or $β_{1} + β_{2}$ (for queen bees). The parameter $β_{3}$ will control how fast the curve approaches the asymptote. You will need to specify priors. One option would be to specify priors for log( $β_{1}$ ) and log( $β_{3}$ ) so that $β_{1}$ and $β_{3}$ are ensured to be positive.

Create a plot of the data, overlaying the estimated mean response curve from step 4 separately for queen and non-queen bees (i.e., plot $E [r e m o v e d | d u r a t i o n, q u e e n]$ and $E [r e m o v e d | d u r a t i o n, n o t q u e e n]$ versus $d u r a t i o n$ ).
Add pointwise confidence/credible intervals to your estimated response curves.

References

Kéry, M. (2010). Introduction to WinBUGS for ecologists: Bayesian approach to regression, ANOVA, mixed models and related analyses. Academic Press.

Lappin, A. K., & Husak, J. F. (2005). Weapon performance, not size, determines mating success and potential reproductive output in the collared lizard (crotaphytus collaris). The American Naturalist, 166(3), 426–436.

Middleton, K. M., & Pruim, R. (2015). Abd: The analysis of biological data. Retrieved from https://CRAN.R-project.org/package=abd

Whitlock, M., & Schluter, D. (2015). The analysis of biological data. Roberts Publishers.

Harder, L. D., & Thomson, J. D. (1989). Evolutionary options for maximizing pollen dispersal of animal-pollinated plants. The American Naturalist, 133(3), 323-344.↩︎