Lesson 4.10: Heteroskedasticity: Detection and Correction

This lesson provides a rigorous examination of heteroskedasticity, the violation of the classical assumption of constant error variance. We will derive the precise mathematical consequences of this violation for the OLS estimator's variance, detail the theoretical basis for its detection via the White test, and derive the heteroskedasticity-consistent covariance matrix estimator as the appropriate remedy.

Part 1: Formal Definition and Consequences

1.1 The Homoskedasticity Assumption in Matrix Form

One of the critical Gauss-Markov assumptions (Assumption 4) is that the variance-covariance matrix of the error vector $\bm{\epsilon}$ , conditional on the design matrix $\mathbf{X}$ , is a spherical error covariance matrix. This property is known as **Homoskedasticity**.

Assumption: Homoskedasticity

The error terms are homoskedastic if the conditional variance of each error term is a constant, $\sigma^2$ , and the conditional covariance between any two distinct error terms is zero.

E[\epsilon_i^2 | \mathbf{X}] = \sigma^2 \quad \text{for all } i

E[\epsilon_i \epsilon_j | \mathbf{X}] = 0 \quad \text{for all } i \ne j

In matrix notation, this is expressed as:

E[\bm{\epsilon}\bm{\epsilon}^T | \mathbf{X}] = \sigma^2 \mathbf{I}_n

where $\mathbf{I}_n$ is the $n \times n$ identity matrix.

1.2 The Violation: Heteroskedasticity

Heteroskedasticity (often called "hetero") means the homoskedasticity assumption is violated. While we maintain the assumption of no autocorrelation ( $E[\epsilon_i \epsilon_j | \mathbf{X}] = 0$ ), the variance of the error terms is no longer constant.

Condition: Heteroskedasticity

The error terms are heteroskedastic if the conditional variance of the error term $\epsilon_i$ is not constant, but instead depends on $i$ (and therefore on the values of $\mathbf{x}_i$ ).

E[\epsilon_i^2 | \mathbf{X}] = \sigma_i^2

In this case, the variance-covariance matrix of the error vector is a non-scalar diagonal matrix, denoted $\mathbf{\Omega}$ :

E[\bm{\epsilon}\bm{\epsilon}^T | \mathbf{X}] = \mathbf{\Omega} = \begin{bmatrix} \sigma_1^2 & 0 & \dots & 0 \\ 0 & \sigma_2^2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma_n^2 \end{bmatrix}

1.3 Mathematical Consequences for the OLS Estimator

Derivation: The True Variance of β̂_OLS

Let us derive the variance of the OLS estimator, $\bm{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$ , without assuming homoskedasticity.

Step 1: Express the estimation error.

\bm{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{X}\bm{\beta} + \bm{\epsilon}) = \bm{\beta} + (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\bm{\epsilon}

\bm{\hat{\beta}} - \bm{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\bm{\epsilon}

Step 2: Use the definition of the variance-covariance matrix.

\text{Var}(\bm{\hat{\beta}} | \mathbf{X}) = E[(\bm{\hat{\beta}} - \bm{\beta})(\bm{\hat{\beta}} - \bm{\beta})^T | \mathbf{X}]

Step 3: Substitute the estimation error.

\text{Var}(\bm{\hat{\beta}} | \mathbf{X}) = E\left[ ((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\bm{\epsilon}) ((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\bm{\epsilon})^T | \mathbf{X} \right]

Using the transpose rule $(\mathbf{AB})^T=\mathbf{B}^T\mathbf{A}^T$ , the second term becomes $\bm{\epsilon}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}$ (since $(\mathbf{X}^T\mathbf{X})$ is symmetric).

\text{Var}(\bm{\hat{\beta}} | \mathbf{X}) = E\left[ (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \bm{\epsilon}\bm{\epsilon}^T \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} | \mathbf{X} \right]

Step 4: Apply the expectation operator. Since $\mathbf{X}$ is treated as fixed, we can move the expectation inside to the only random component, $\bm{\epsilon}\bm{\epsilon}^T$ .

\text{Var}(\bm{\hat{\beta}} | \mathbf{X}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T E[\bm{\epsilon}\bm{\epsilon}^T | \mathbf{X}] \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}

Step 5: Insert the general error covariance matrix $\mathbf{\Omega}$ .

True Variance of OLS

\mathbf{V}_{\text{True}}(\bm{\hat{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{\Omega} \mathbf{X} (\mathbf{X}^T\mathbf{X})^{-1}

This is the **true, general formula** for the variance of the OLS estimator.

The Failure Point: The standard OLS software formula is derived by incorrectly assuming homoskedasticity, i.e., by substituting $\mathbf{\Omega} = \sigma^2\mathbf{I}$ into the true formula:

\mathbf{V}_{\text{OLS}}(\bm{\hat{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T (\sigma^2\mathbf{I}) \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}

= \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1} (\mathbf{X}^T\mathbf{X}) (\mathbf{X}^T\mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}

When heteroskedasticity is present ( $\mathbf{\Omega} \ne \sigma^2\mathbf{I}$ ), the standard OLS variance formula $\sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}$ is incorrect, biased, and inconsistent. Consequently, all standard errors, t-statistics, and F-statistics are invalid.

Part 2: Detection of Heteroskedasticity

The White Test for Heteroskedasticity

The White (1980) test provides a general method for detecting heteroskedasticity. The intuition is to test whether the variance of the errors is systematically related to the regressors.

Theoretical Basis: Since we cannot observe the true errors $\epsilon_i$ , we use the squared OLS residuals, $e_i^2$ , as a consistent proxy for the true error variances, $\sigma_i^2$ . We then test if these proxies can be predicted by the original regressors.

Hypotheses:

$H_0:$ Homoskedasticity (The error variance is constant and unrelated to $\mathbf{X}$ ).
$H_1:$ Heteroskedasticity (The error variance is a function of $\mathbf{X}$ ).

Procedure:

Run the primary regression $\mathbf{y} = \mathbf{X}\bm{\beta} + \bm{\epsilon}$ and obtain the residuals $\mathbf{e}$ .
Construct the squared residuals, $e_i^2$ .
Run an auxiliary regression of the squared residuals on a set of regressors $\mathbf{Z}$ that includes the original regressors, their squares, and their cross-products.
$e_i^2 = \delta_0 + \delta_1 x_{i1} + \dots + \delta_p z_{ip} + v_i$
Obtain the $R^2$ from this auxiliary regression.

The test statistic is a form of a Lagrange Multiplier (LM) test.

The White Test Statistic (LM Version)

Under the null hypothesis of homoskedasticity, the following statistic is asymptotically distributed as a Chi-squared random variable:

\text{LM} = n R^2_{\text{aux}} \xrightarrow{d} \chi^2_p

where $p$ is the number of regressors in the auxiliary regression (excluding the constant).

Decision Rule: If $n R^2_{\text{aux}}$ exceeds the critical value from the $\chi^2_p$ distribution for a chosen significance level $\alpha$ , we reject $H_0$ and conclude that heteroskedasticity is present.

Part 3: Correction: Heteroskedasticity-Consistent Standard Errors (HCSE)

Given that OLS coefficient estimates $\bm{\hat{\beta}}$ remain unbiased and consistent under heteroskedasticity, the most common remedy is not to change the estimator, but to correct the formula for its variance.

We begin with the true variance formula derived in Part 1:

\mathbf{V}_{\text{True}}(\bm{\hat{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{\Omega} \mathbf{X} (\mathbf{X}^T\mathbf{X})^{-1}

The challenge is that $\mathbf{\Omega}$ is unknown. White (1980) proposed a consistent estimator for this matrix.

The White/Eicker/Huber HCSE Estimator

The key insight is to replace the unknown diagonal elements of $\mathbf{\Omega}$ , $\sigma_i^2$ , with their consistent sample counterparts, the squared OLS residuals, $e_i^2$ .

We form the estimator $\mathbf{\hat{\Omega}} = \text{diag}(e_1^2, e_2^2, \dots, e_n^2)$ , often denoted $\mathbf{S}$ .

Substituting this into the true variance formula gives the **Heteroskedasticity-Consistent Covariance Matrix Estimator (HCSE)**, also known as the "sandwich estimator":

\mathbf{V}_{\text{HCSE}}(\bm{\hat{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{\hat{\Omega}} \mathbf{X} (\mathbf{X}^T\mathbf{X})^{-1}

Practical Implementations (HC0, HC1, HC2, HC3)

The basic White estimator (often called HC0) can be biased in small samples. Various small-sample corrections have been developed:

HC1 (Default in Stata): A simple degrees-of-freedom correction: $\frac{n}{n-k-1}$ .
HC2, HC3: More complex corrections that adjust for the leverage of individual observations. HC3 is often recommended for smaller samples as it is more conservative.

The **Robust Standard Errors ( $\se_{\text{robust}}$ )** reported by statistical software are the square roots of the diagonal elements of one of these estimated $\mathbf{V}_{\text{HCSE}}$ matrices.

Multicollinearity and the VIF

Autocorrelation: Detection and Consequences