Chapter 9 Introduction to Probability Distributions

9.1 Exercise 1

This exercise will consider the 4 basic functions used to work with distributions in the context of the Normal distribution with mean 0 and standard deviation 1. The functions are named dnorm, pnorm, qnorm, rnorm and are depicted in Figure 9.1.

Figure from (Jack Weiss’s class notes)[https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/lectures/lecture13.htm] depicting the 4 basic probability functions for working with the Normal distribution in R.

Figure 9.1: Figure from (Jack Weiss’s class notes)[https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/lectures/lecture13.htm] depicting the 4 basic probability functions for working with the Normal distribution in R.

Assume women’s heights follow a normal distribution with mean of 64 inches and standard deviation 3 inches. Assume men’s heights follow the same distribution but with an average of 70 inches. Use R to:

  1. Find the probability that a randomly chosen female is less than 60 inches tall.

  2. Find the probability a randomly chosen male is at least 72 inches tall.

  3. Find the height, x, such that only 2% of females are shorter than x.

  4. Find the proportion of males between 68 and 71 inches tall.

  5. Plot the distributions for both males and females (i.e., both normal distributions) on the same plot. You can do this by:

  1. Creating an object, x, that spans a range of heights, e.g., x <-seq(50,80, length=1000).
  2. Using plot(x, dnorm(x, mean= , sd = )) for the first density and lines(x, dnorm(x, mean=, sd = )) for the second. Or, using the curve function twice (with add=TRUE) for the second density (e.g. curve(dnorm(x, mean=, sd=), from=50, to=80).
  1. Generate 5000 random male heights and 5000 random female heights. Combine the data and create a smooth histogram of the simulated heights using the density function. Does the distribution appear to be Gaussian?

9.2 Exercise 2

The Minnesota DNR decides to monitor various wildlife species using remotely triggered camera traps. Cameras are set up at various locations throughout the state. When an animal passes in front of the camera, the camera is “triggered”, and a picture of the animal is recorded. Assume the number of pictures of white-tailed deer recorded in a week can be described by a negative binomial distribution with \(\mu=3\) and \(\theta = 1\) (note, \(\theta\) is referred to as size in R’s dnbinom function). Use dnbinom, qnbinom, rnbinom, and pnbinom to answer the following questions:

  1. What is the probability that a camera trap catches at least 1 deer in a week?
  2. What is the probability that a camera trap catches between 3 and 5 deer in a week?
  3. Calculate the number of pictures of deer in a week, \(x\), such that 90% of the camera traps record fewer pictures than \(x\).
  4. Simulate a data set with 100 different camera traps (with a negative binomial distribution and parameters \(\mu=3\) and \(\theta = 1\)). Compare the distribution of the simulated data to the negative binomial distribution with \(\mu\) = 3 and \(\theta\) = 1. You can adapt methods used in class to do this - or consider using the following steps:
  1. Create a histogram using the hist function in R, but make sure to include the argument probability=TRUE. To make sure everything lines up correctly, I also recommend using the breaks argument to make sure that there is a bin for each integer value. It also helps to have bins centered at 0, 1, 2, etc. This can be accomplished by adding: (breaks=seq(-0.5, max(object containing your random numbers)+1, by =1)).

  2. Create an object x <- seq(0, max(object containing your random numbers),1).

  3. Overlay the negative binomial’s probability mass function using lines(x, dnbinom(x, mu = 3, size = 1), type = "h").

9.3 Exercise 3

This exercise will have you work through a few different problems using probability rules.

  1. The probability a wolf pack encounters suitable prey on any given hunting bout is 0.4. If the pack finds a suitable animal, the probability that it will successfully kill it is 0.05. What is the probability that the pack will have a successful kill on a given hunting bout?

  2. A nest has a daily survival probability of 0.98 for each of the first 3 days and a daily survival probability of 0.95 for next 17 days. Assume birds leave the nest after 20 days. What is the probability that a nest will be successful? What is the probability the nest will fail exactly on the fourth day?

  3. Suppose you have 5 individuals in your family and you need to take a covid test to determine if you can fly to see your parents. You read that there is a 2% chance of a false positive on any given test. What is the probability that at least one of you test positive for covid when none of you have the disease?

  4. Consider a mark-recapture study with 3 recapture intervals (assume the population is demographically closed during the study - i.e., no individuals are “gained” or “lost” during the duration of the study). Assume the probability of surviving to the start of each interval is \(s\) and the probability of being detected at each of these time points is \(p\) when the animal is alive and 0 if the animal is dead (assume both \(s\) and \(p\) are constant for all intervals). Write down the probability associated with the following capture histories: (0 1 1), (1 0 1), and (1 0 0). Hint: there are multiple ways that we might see a “0” for the last two intervals - it may help to construct a probability tree representing the different events that can occur.

  5. Assume that 0.9% of Minnesotan’s are infected with covid-19. Further, assume current covid tests have a sensitivity of 80%, meaning that the probability that a person with covid tests positive is 80%. Assume that the false positive rate is 0.5% (meaning that the probability of having a positive test, given one does not have covid is 0.005). These statistics are summarized below:

  • P(a randomly selectded individual has covid) = P(covid+) = 0.009
  • P(test positive | one has covid) = P(test+| covid+) = 0.80
  • P(test positive | one does NOT have covid) = P(test+ | covid-) = 0.005
  1. What is the probability of getting a negative test if one has covid?
  2. What is the probability of getting a negative test if one does not have covid?
  3. What is the probability that a randomly chosen individual tests positive for covid (irrespective of whether they actually have covid or not)?
  4. What is the probability that one has covid, given that they tested positive for covid?

9.4 Exercise 4

We will often use compound distributions to represent multiple sources of variability in Ecological data. A compound distribution is formed by assuming the parameters of the distribution of interest are also random variables (from another specified distribution).

  1. Binomial Distribution: you are a bird researcher who has just discovered a new species of tern in Antarctica. Assume for the time being that this species of bird lays clutches of 11 eggs and that each egg has, on average, an 81% chance of survival (to hatch).
  1. Use rbinom to simulate the number hatchlings in each of 1000 nests.
  2. Create a histogram showing the distribution of the number of hatchlings across the 1000 nests.
  3. What is the expected number of hatchlings in a clutch of size (\(n\)) = 11 (where each egg has an 81% probability of surviving to hatch [\(p\)]). Determine this value using the properties of the Binomial Distribution. Compare this value to the mean number of hatchlings determined by simulating 1000 nests.
  1. Beta-binomial (a compound distribution): Now, consider that some nests have higher survival rates than other nests. Beta random variables lie in the (0,1) interval, so the beta distribution is a natural distribution for modeling variability in \(p\) = probability of success. Assume that the probability of egg survival varies from nest to nest according to a beta distribution with parameters \(\alpha\) = 405 and \(\beta\) = 95 (these are refered to as shape1 and shape2 in R).
  1. Determine the average survival rate (across nests) when using this beta distribution. I.e., determine the expected value of the beta distribution with parameters \(\alpha\) = 405 and \(\beta\) = 95.
  2. Use rbeta(1000, shape1 = 405, shape2 = 95) to simulate expected survival rates in each of 1000 nests (and store these values in an R object called psurvs).
  3. Use rbinom to again simulate the number of eggs that survive in each of 1000 nests (assuming each nest has 11 eggs): rbinom(1000, size = 11, prob = psurvs).
  4. Plot the distribution of the number of hatchlings across the 1000 nests.
  1. Gamma-Poisson (a compound distribution that is equivalent to the Negative Binomial distribution): Assume that the expected clutch size varies from nest to nest according to a gamma distribution with shape parameter (\(\alpha\)) = 22 and rate parameter (\(\beta\)) = 2. Assume the actual clutch size, given the expected clutch size, follows a Poisson distribution.
  1. Simulate expected clutch sizes for each nest using rgamma(1000, shape = 22, rate = 2) (store these in an R object called lambdas).
  2. Simulate actual clutch sizes using rpois(1000, lambda = lambdas).
  3. Plot the distribution of clutch sizes across the 1000 nests.

How would we describe the distribution of hatchlings in part [2] and the distribution of clutch sizes in part [3]? One option is to specify the model hierarchically:

Let \(Y_i\) = the number of hatchlings in nest \(i\).

\(Y_i | p_i \sim Binomial(11, p_i)\), with \(p_i \sim Beta(\alpha, \beta)\)

Let \(Z_i\) = the size of the clutch in nest \(i\)

\(Z_i | \lambda_i \sim Poisson(\lambda_i)\), with \(\lambda_i \sim Gamma(\alpha, \beta)\)

It turns out, that one can also analytically solve for the marginal distribution of \(Y_i\) and \(Z_i\) in these two special cases. The marginal distribution of \(Y_i\) is the distribution of \(Y_i | p_i\) averaged over all values of \(p_i\) (similarly, the marginal distribution of \(Z_i\) is the distribution of \(Z_i | \lambda_i\) averaged over all possible values of \(\lambda_i\)).

The marginal distribution of \(Y_i\) is called a beta-binomial distribution. For a description of its probability mass function, see:

https://en.wikipedia.org/wiki/Beta-binomial_distribution

This distribution differs from the conditional distribution \(Y_i | p_i\) (a binomial distribution).

  1. The marginal distribution of clutch sizes, \(Z_i\) is a negative binomial distribution with mean \(\mu = \alpha/\beta\) and dispersion parameter \(\theta= \alpha\). Verify this result by plotting the probability mass function for this negative binomial distribution.

9.5 Exercise 5

This exercise will explore the Central Limit Theorem as it applies to the Binomial and Poisson Distributions and is based on Jack Weiss’s notes from his Ecol 563 coures.

  1. The binomial probability mass function is given by \(f(x) = P(X=x) = {n \choose x}p^x(1-p)^{n-x}\). Use the dbinom function in R to calculate the probability mass function for \(x = 0, 1, 2, \ldots, 10\) when \(p = 0.1, 0.5, \text{ or } 0.9\) and \(n = 10\). Plot these different distributions next to each other in different panels or in the same plot window.

  2. Repeat but with \(n = 100\). Plot the probability density function for \(x = 30, 40, 41, \ldots, 100\) when \(p = 0.1, 0.5, \text{ or } 0.9\).

  3. Comment on the shape of the distribution as \(p\) and \(n\) change. When might the Normal distribution serve as a good approximation to the Binomial distribution?

  4. The Poisson probability mass function is given by \(f(x) = P(X=x) = \frac{\exp(-\lambda )(\lambda)^x}{x!}\). Use the dpois function in R to calculate the probability mass function for \(x = 0, 1, 2, \ldots, 35\) when \(\lambda = 1, 5, \text{ or } 15\). Plot these different distributions next to each other in different panels or in the same plot window.

  5. Comment on the shape of the distribution as \(\lambda\) changes. When might the Normal distribution serve as a good approximation to the Poisson distribution?

  6. Consider the probability mass function for the negative binomial specified in terms of \(\mu\) and \(\theta\), \(f(x) = P(X = x) = {x+\theta-1 \choose x}\left(\frac{\theta}{\mu+\theta}\right)^{\theta}\left(\frac{\mu}{\mu+\theta}\right)^x\). Fix the mean of the negative binomial distribution to be 1 (\(\mu\) = 1) and vary the dispersion parameter, \(\theta\). Plot the probability mass function for \(X = 0, 1, 2, \ldots, 20\) when \(\theta = 0.1, 1, 10, \text{ or } 100\). Plot these different distributions next to each other in different panels or in the same plot window.

  7. What happens to the shape of the distribution and the fraction of zeros as \(\theta\) gets small?

  8. When \(\theta\) = 100, the distribution looks very Poisson-like. Compare this negative binomial distribution to a Poisson distribution with a mean of 1.

  9. Finally, fix the dispersion parameter of the negative binomial distribution at 0.1 and vary the mean with \(\mu = 1, 10, \text{ or } 20\). Plot these different distributions next to each other in different panels or in the same plot window.

  10. What happens to the shape of the distribution and the fraction of zeros as \(\mu\) increases?

9.6 Hints for plotting distributions:

We can plot the standard Normal probability density function and mark the 90th percentile using:

curve(dnorm,-3,3); lines(qnorm(0.9),dnorm(qnorm(0.9)), type="h", col="red")

We can shade the area under the N(0,1) probability desnity function to the right of the 90th percentile using:

x1=seq(qnorm(0.9), 3, 0.01) 
y1=dnorm(x1)
curve(dnorm,-3, 3)
lines(x1, y1, type = "h", col = "red")

Or:

curve(dnorm,-3,3)
polygon(c(rep(1,201),rev(seq(1,3,.01))),c(dnorm(seq(1,3,.01)),
dnorm(rev(seq(1,3,.01)))), col = "orange", lty = 2, lwd = 2, border = "red")