**Open Review**. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

## 14.2 Model specification: Redundant variables

While there are some ways of testing for omitted variables, the redundant ones are very difficult to diagnose. Yes, we could look at the significance of variables or compare models with and without some variables based on information criteria, but even if our approaches say that a variable is not significant, this does not mean that it is not needed in the model. There can be many reasons, why a test would fail to reject H\(_0\) and AIC would prefer a model without the variable under consideration. So, it comes to using judgment, trying to figure out whether a variable is needed in the model or not.

In the example with Seatbelt data, `DriversKilled`

would be a redundant variable. Let’s see what happens with the model in this case:

```
<- adam(Seatbelts, "NNN",
adamModelSeat04 formula=drivers~PetrolPrice+kms+
+rear+law+DriversKilled)
frontpar(mfcol=c(1,2))
plot(adamModelSeat04,7:8)
```

The residuals from this model look adequate, with only issue being the first 45 observations lying below the zero line. The summary of this model is:

`summary(adamModelSeat04)`

```
##
## Model estimated using alm() function: Regression
## Response variable: drivers
## Distribution used in the estimation: Normal
## Loss function type: likelihood; Loss function value: 1159.417
## Coefficients:
## Estimate Std. Error Lower 2.5% Upper 97.5%
## (Intercept) 320.2844 127.4014 68.9379 571.5145 *
## PetrolPrice 741.7600 769.1811 -775.7343 2258.5517
## kms -0.0039 0.0042 -0.0122 0.0044
## front 0.9302 0.1375 0.6589 1.2014 *
## rear -0.6859 0.2122 -1.1044 -0.2675 *
## law 67.9625 35.8203 -2.7064 138.5986
## DriversKilled 6.6785 0.4377 5.8150 7.5416 *
##
## Sample size: 192
## Number of estimated parameters: 7
## Number of degrees of freedom: 185
## Information criteria:
## AIC AICc BIC BICc
## 2332.834 2333.443 2355.637 2357.237
```

The uncertainty around the parameter `DriversKilled`

is narrow, showing that the variable has a positive impact on the `drivers`

. However the issue here is not statistical, but rather fundamental: we have included the variable that is a part of our response variable. It does not explain why drivers get injured and killed, it just reflects a specific part of this relation. So it explains part of the variance, which should have been explained by other variables (e.g. `kms`

and `law`

), making them statistically not significant. So, based on technical analysis we would be inclined to keep the variable, but based on our understanding of the problem we should not.

If we have redundant variables in the model, then the model might overfit the data, leading to narrower prediction intervals and biased forecasts. The parameters of such model are typically unbiased, but inefficient.