Chapter 5 Generalized Least Squares (GLS)
5.1 Exercise 1
This first exercise will consider data collected by Eric North, an urban forester who took my class the first or second year I taught it. Some of Eric’s masters research was aimed at predicting damage to sidewalks from boulevard trees planted between the curb and sidewalk (North, Johnson, & Burk, 2015). The degree of damage largely depends on how much the trunk of the tree spreads out (measured by “trunk flare” Figure 5.1 right), which can be predicted from measures of the diameter at breast height (Figure 5.1 left).
We will explore linear models relating trunk flare circumference (\(Y\)) to diameter at breast height (\(X\)). As noted by Hilbert et al. (2020) and North et al. (2015), there are several potential uses of this type of model, including:
- determining planting space needed for trees based on their potential size
- identifying areas where there might be current or future damage to infrastructure due to tree roots
- estimating stump size for cost estimates associated with stump removal.
The data for this set of exercises is contained in the Data4Ecologists
package and can be accessed using:
- Subset the data to include only Acer saccharinum (Species = 4; sugar maple).
- Fit a linear model relating trunk flare (\(Y\)) to diameter at breast height (\(X\)) using
lm
. Evaluate the assumptions of the model using residual plots. - Fit a model where the variance is assumed to increase with
dbh
using thevarPower
function. - Describe the fitted GLS model using a set of equations. Match the parameters in your equations to the output generated using the
summary
function applied to your fitted GLS model.
- Create a plot of Pearson residuals, \(r_i = \frac{Y_i - \hat{Y}_i}{\sqrt{\widehat{var}(Y_i)}}\) versus \(\hat{Y}_i\) (you can use
plot(modelname)
to accomplish this task). Comment on whether you think the GLS model is appropriate. - Use AIC to compare the fit of the standard linear model and the fit of the GLS model (use
AIC(model1, model2)
, wheremodel1
andmodel2
are the names you assign to the two fitted models). Before making this comparison, refit the linear model usinggls
(this will ensure both models are fit using “REML”, which will be discussed in a later chapter). Which model appears to give the better fit (note: lower AIC is “better”)? - Inspect the estimated regression coefficients and their standard errors. Do they change much when you account for non-constant variance?