survdat<-data.frame(survival_times = c(12, 15, 22, 24, 24, 32, 32, 33, 34, 38, 38, 43, 44,
48, 52, 53, 54, 54, 55, 56, 57, 58, 58, 59, 60, 60,
60, 60, 61, 62, 63, 65, 65, 67, 68, 70, 70, 72, 73,
75, 76, 76, 81, 83, 84, 85, 87, 91, 95, 96, 98, 99,
109, 110, 121, 127, 129, 131, 143, 146, 146, 175, 175, 211, 233,
258, 258, 263, 297, 341, 341, 376))10 Maximum Likelihood
Exercise 1
For this problem, you will work with data on the survival times (in days) of 72 guinea pigs infected with virulent tubercle bacilli (Bjerkedal et al. 1960):
You can read in the data using the following R code:
We will begin by fitting an exponential distribution to these data. The probability density function for the exponential distribution is given by: dexp in R). Using this information and the above data to:
Calculate L
and log[L ] for values of between 0.0001 and 0.05 in increments of 0.0005.Plot L
and log[L ] versus .Use
optimto estimate . For this step, you will want to create a function that calculates the negative log-likelihood since by default,optimfinds the minimum and not the maximum.Plot the fitted exponential distribution on top of a histogram of the data.
Calculate a 95% confidence interval for
using the inverse of the Hessian and a Normal approximation for the sampling distribution ofOptional: Calculate a profile likelihood confidence interval and compare it to the interval based on the Normal distribution from step 5.
Exercise 2
For this exercise, we will use maximum likelihood to fit functional-response models to predator-prey data to describe how prey intake rates by predators vary with prey density. We will consider data for two species of freshwater amphipods (Gammarus spp.) that eat black fly larvae (Simulium spp.). The data are contained in the frair package (Pritchard 2017) and can be loaded using:
#install.packages("frair") # if not already installed
library(frair)
data(gammarus)Plot the number of prey eaten (
eaten), versus prey-density (density). For now, ignore that the data consist of two different prey species. Note that: a) the number of prey eaten increases with prey-density; b) there appears to be some “leveling off” of the curve with the rate of increase in prey eaten slowing down as prey density nears its maximum observed value; c) the variation in the response appears to increase with prey density.Assume the per capita predation rate on prey exhibits a Holling type II functional response described by the following model:
where
Further, assume the number of prey actually eaten in any given experiment,
This implies that the expected total number of prey eaten will be given by:
Thus,
Calculate the expected value of
Overlay these expected values on the scatterplot of the data.
Write a function to calculate the likelihood of the data under this model, noting
are recorded in theeatenvariable and the in thedensityvariable of thegammarusdata set.Use
optimto determine maximum likelihood estimates of the parameters and . For starting values, use . Plot the expected number eaten on the scatterplot of the data using the maximum likelihood estimates of and .Use the delta method to calculate a 95% confidence interval for
for ranging from 1 to 30. Add lines depicting this confidence interval to your scatterplot.If you also wanted to calculate a prediction interval, what would you need to consider? Share any ideas you might have for how you could calculate a prediction interval for
(as opposed to a confidence interval for ).
Exercise 3
For this exercise, we will analyze survival data collected for a parasitoid (Aphelinus certus) of the soybean aphid (Aphis glycines) over a range of host densities using a laboratory assay (Miksanek and Heimpel 2020). The data can be accessed using:
library(Data4Ecologists)
data(parasitoidlifespan)If you read Miksanek and Heimpel (2020), you will see that lifespans depend heavily on host density. To begin with, we will just consider data from experiments where the number of hosts were plentiful. Create a data set named
lifespan2that only includes observations wheretreatment(i.e., host density) = 10 or 50.There are some observations with incomplete lifespans due to individuals that were sacrificed along the way. These have values of
status= -1 (parasitoid lost at start of the study) or 0 (parasitoid lost during assay, censor data). Create a data setlifespan3that only includes observations wherestatus = 1(no issues).Assume the distribution of lifespans in the
lifespans3data set can be described by a gamma distribution with probability density function given by: (denoted bydgammain R). In this parameterization, is referred to as the shape parameter and as a scale parameter. Useoptimto estimate and . To get starting values, note that:
We can use what is called the method of momemts to get preliminary estimates of
- Plot the fitted gamma distribution on top of a histogram of the data.
Additional Optional Steps:
Although we dropped observations with
status = 0, these observations can be included in the analysis and inform our estimate of . Specifically, an individual sacrificed at time is known to have survived at least that long. Whereas observed ages at death contribute to the likelihood, individuals that are sacrificed should contribute to the likelihood (recall, so – i.e., these right-censored observations tell us that the individual will live at least as long as ; see e.g., Fieberg and DelGiudice (2011)). Create a new data setlifespan4that starts withlifespan2and only drops observations withstatus = -1. Modify your likelihood function to also consider these additional observations. Hint:pgammagives the cumulative distribution function for the gamma distribution, . So,log(1-pgamma(x, shape, scale)will give the contributions to the log-likelihood for these observations.Plot the fitted gamma distribution when making use of the right-censored observations. Note any changes.
Use the Hessian to calculate SEs for both
and . Do this for both the original likelihood and the likelihood that incorporated the right-censored observations. How have the SEs changed? Comment on what you have found.
Exercise 4
Harder and Thompson (1989)1 quantified the amount of pollen removed by bumblebee queens and honeybee workers as a function of the time they spent at flowers. These data can be accessed using:
library(Data4Ecologists)
data(beepollen)The data set has three variables:
removed= proportion of the pollen removedduration= duration of visit (in seconds)queen= 1 if the been was a queen bee and 0 otherwise
- Use optim to fit the following model:
Note: the mean response pattern will asymptote at either optim, you will need good starting values. You should be able to guess at the asympotote by plotting the data. For
Create a plot of the data, overlaying the estimated mean response curve from step 1 separately for queen and non-queen bees (i.e., plot
and versus ).Add pointwise confidence/credible intervals to your estimated response curve in step 2 using the delta method.
References
Harder, L. D., & Thomson, J. D. (1989). Evolutionary options for maximizing pollen dispersal of animal-pollinated plants. The American Naturalist, 133(3), 323-344.↩︎