Chapter 10 Maximum Likelihood

10.1 Exercise 1

For this problem, you will work with data on the survival times (in days) of 72 guinea pigs infected with virulent tubercle bacilli (Bjerkedal et al., 1960):

You can read in the data using the following R code:

survdat<-data.frame(survival_times = c(12, 15, 22, 24, 24, 32, 32, 33, 34, 38, 38, 43, 44,
48, 52, 53, 54, 54, 55, 56, 57, 58, 58, 59, 60, 60,
60, 60, 61, 62, 63, 65, 65, 67, 68, 70, 70, 72, 73,
75, 76, 76, 81, 83, 84, 85, 87, 91, 95, 96, 98, 99,
109, 110, 121, 127, 129, 131, 143, 146, 146, 175, 175, 211, 233,
258, 258, 263, 297, 341, 341, 376))

We will begin by fitting an exponential distribution to these data. The probability density function for the exponential distribution is given by: $f (Y) = λ e^{- λ Y}$ (denoted by dexp in R). Using this information and the above data to:

Calculate L $(λ | x)$ and log[L $(λ | x)$ ] for values of $λ$ between 0.0001 and 0.05 in increments of 0.0005.
Plot L $(λ | x)$ and log[L $(λ | x)$ ] versus $λ$ .
Use optim to estimate $\hat{λ}$ . For this step, you will want to create a function that calculates the negative log-likelihood since by default, optim finds the minimum and not the maximum.
Plot the fitted exponential distribution on top of a histogram of the data.
Calculate a 95% confidence interval for $λ$ using the inverse of the Hessian and a Normal approximation for the sampling distribution of $\hat{λ}$
Optional: Calculate a profile likelihood confidence interval and compare it to the interval based on the Normal distribution from step 5.

10.2 Exercise 2

For this exercise, we will use maximum likelihood to fit functional-response models to predator-prey data to describe how prey intake rates by predators vary with prey density. We will consider data for two species of freshwater amphipods (Gammarus spp.) that eat black fly larvae (Simulium spp.). The data are contained in the frair package (Pritchard, 2017) and can be loaded using:

#install.packages("frair") # if not already installed
library(frair)
data(gammarus)

Plot the number of prey eaten (eaten), versus prey-density (density). For now, ignore that the data consist of two different prey species. Note that: a) the number of prey eaten increases with prey-density; b) there appears to be some “leveling off” of the curve with the rate of increase in prey eaten slowing down as prey density nears its maximum observed value; c) the variation in the response appears to increase with prey density.
Assume the per capita predation rate on prey exhibits a Holling type II functional response described by the following model:

$F (N) = a / (1 + a h N),$

where $F (N)$ gives the probability an individual prey will succumb to predation given the number of prey available to the predators, $N$ . The model has two parameters: $a$ is the attack rate, reflecting the search efficiency of predators and $h$ is the handling time needed by predators to consume prey they capture. A type II response implies that the per-capita predation rate decreases hyperbolically with density and that the total number of prey eaten will asymptote as the number of prey increases.

Further, assume the number of prey actually eaten in any given experiment, $K_{i}$ , is a binomial random variable:

$K_{i} \sim B i n o m i a l (N_{i}, p_{i} = a / (1 + a h N_{i}))$

This implies that the expected total number of prey eaten will be given by:

$E [K_{i} | N_{i}] = \frac{N_{i} a}{1 + a h N_{i}} = \frac{a}{1 / N_{i} + a h}$

Thus, $E [K_{i} | N_{i}]$ should asymptote at $1 / h$ (as $N \to \infty$ ). This result can be useful for determining appropriate starting values for $h$ (e.g., 1/max( $K_{i}$ ) may be often be a reasonable choice).

Calculate the expected value of $K$ for $N$ ranging from 1 to 30 for the following parameter values:

$a = 3, h = 1 / 30$
$a = 1, h = 1 / 20$
$a = 0.5, h = 1 / 30$

Overlay these expected values on the scatterplot of the data.

Write a function to calculate the likelihood of the data under this model, noting $K_{i}$ are recorded in theeaten variable and the $N_{i}$ in the density variable of the gammarus data set.
Use optim to determine maximum likelihood estimates of the parameters $a$ and $h$ . For starting values, use $h = 1 / 30, a = 0.5$ . Plot the expected number eaten on the scatterplot of the data using the maximum likelihood estimates of $a$ and $h$ .
Use the delta method to calculate a 95% confidence interval for $E [K_{i} | N_{i}]$ for $N_{i}$ ranging from 1 to 30. Add lines depicting this confidence interval to your scatterplot.
If you also wanted to calculate a prediction interval, what would you need to consider? Share any ideas you might have for how you could calculate a prediction interval for $K_{i} | N_{i}$ (as opposed to a confidence interval for $E [K_{i} | N_{i}]$ ).

10.3 Exercise 3

For this exercise, we will analyze survival data collected for a parasitoid (Aphelinus certus) of the soybean aphid (Aphis glycines) over a range of host densities using a laboratory assay (Miksanek & Heimpel, 2020). The data can be accessed using:

library(Data4Ecologists)
data(parasitoidlifespan)

If you read Miksanek & Heimpel (2020), you will see that lifespans depend heavily on host density. To begin with, we will just consider data from experiments where the number of hosts were plentiful. Create a data set named lifespan2 that only includes observations where treatment (i.e., host density) = 10 or 50.
There are some observations with incomplete lifespans due to individuals that were sacrificed along the way. These have values of status = -1 (parasitoid lost at start of the study) or 0 (parasitoid lost during assay, censor data). Create a data set lifespan3 that only includes observations where status = 1 (no issues).
Assume the distribution of lifespans in the lifespans3 data set can be described by a gamma distribution with probability density function given by: $f (y; α, β) = \frac{1}{Γ (α)} y^{α - 1} β^{α} \exp (- β y)$ (denoted by dgamma in R). In this parameterization, $α$ is referred to as the shape parameter and $β$ as a scale parameter. Use optim to estimate $α$ and $β$ . To get starting values, note that:

$E [Y] = \frac{α}{β}$
$V a r [Y] = \frac{α}{β^{2}}$

We can use what is called the method of momemts to get preliminary estimates of $α$ and $β$ by solving these two equations in terms of $α$ and $β$ to give:

$α_{s t a r t} = E [Y]^{2} / V a r [Y]$ and $β_{s t a r t} = V a r [Y] / E [Y]$ .

Plot the fitted gamma distribution on top of a histogram of the data.

Additional Optional Steps:

Although we dropped observations with status = 0, these observations can be included in the analysis and inform our estimate of $λ$ . Specifically, an individual sacrificed at time $t$ is known to have survived at least that long. Whereas observed ages at death contribute $f (t_{i}) = \frac{1}{Γ (α)} t_{i}^{α - 1} β^{α} \exp (- β t_{i})$ to the likelihood, individuals that are sacrificed should contribute $1 - F (t_{i})$ to the likelihood (recall, $F (t) = P (Y \leq t)$ so $1 - F (t) = P (Y \geq t)$ – i.e., these right-censored observations tell us that the individual will live at least as long as $t_{i}$ ; see e.g., Fieberg & DelGiudice (2011)). Create a new data set lifespan4 that starts with lifespan2 and only drops observations with status = -1. Modify your likelihood function to also consider these additional observations. Hint: pgamma gives the cumulative distribution function for the gamma distribution, $F (t)$ . So, log(1-pgamma(x, shape, scale) will give the contributions to the log-likelihood for these observations.
Plot the fitted gamma distribution when making use of the right-censored observations. Note any changes.
Use the Hessian to calculate SEs for both $\hat{α}$ and $\hat{β}$ . Do this for both the original likelihood and the likelihood that incorporated the right-censored observations. How have the SEs changed? Comment on what you have found.

10.4 Exercise 4

Harder and Thompson (1989)⁵ quantified the amount of pollen removed by bumblebee queens and honeybee workers as a function of the time they spent at flowers. These data can be accessed using:

library(Data4Ecologists)
data(beepollen)

The data set has three variables:

removed = proportion of the pollen removed
duration = duration of visit (in seconds)
queen = 1 if the been was a queen bee and 0 otherwise

Use optim to fit the following model:

$\begin{matrix} r e m o v e d_{i} \sim N (μ_{i}, σ^{2}) \\ μ_{i} = (β_{1} + β_{2} I (q u e e n = 1)_{i}) * (1 - e x p (- β_{3} * d u r a t i o n_{i})) \end{matrix}$

Note: the mean response pattern will asymptote at either $β_{1}$ (for worker bees) or $β_{1} + β_{2}$ (for queen bees). The parameter $β_{3}$ will control how fast the curve approaches the asymptote. If you use optim, you will need good starting values. You should be able to guess at the asympotote by plotting the data. For $β_{3}$ , a value around 0.07 should work well.

Create a plot of the data, overlaying the estimated mean response curve from step 1 separately for queen and non-queen bees (i.e., plot $E [r e m o v e d | d u r a t i o n, q u e e n]$ and $E [r e m o v e d | d u r a t i o n, n o t q u e e n]$ versus $d u r a t i o n$ ).
Add pointwise confidence/credible intervals to your estimated response curve in step 2 using the delta method.

References

Bjerkedal, T. et al. (1960). Acquisition of resistance in guinea pies infected with different doses of virulent tubercle bacilli. American Journal of Hygiene, 72(1), 130–48.

Fieberg, J., & DelGiudice, G. D. (2011). Estimating age-specific hazards from wildlife telemetry data. Environmental and Ecological Statistics, 18(2), 209–222.

Miksanek, J. R., & Heimpel, G. E. (2020). Density-dependent lifespan and estimation of life expectancy for a parasitoid with implications for population dynamics. Oecologia, 194(3), 311–320.

Pritchard, D. (2017). Frair: Tools for functional response analysis. Retrieved from https://CRAN.R-project.org/package=frair

Harder, L. D., & Thomson, J. D. (1989). Evolutionary options for maximizing pollen dispersal of animal-pollinated plants. The American Naturalist, 133(3), 323-344.↩︎