Lesson 4.10: Heteroskedasticity: Detection and Correction

This lesson provides a rigorous examination of heteroskedasticity, the violation of the classical assumption of constant error variance. We will derive the precise mathematical consequences of this violation for the OLS estimator's variance, detail the theoretical basis for its detection via the White test, and derive the heteroskedasticity-consistent covariance matrix estimator as the appropriate remedy.

Part 1: Formal Definition and Consequences

1.1 The Homoskedasticity Assumption in Matrix Form

One of the critical Gauss-Markov assumptions (Assumption 4) is that the variance-covariance matrix of the error vector ϵ\bm{\epsilon}, conditional on the design matrix X\mathbf{X}, is a spherical error covariance matrix. This property is known as **Homoskedasticity**.

Assumption: Homoskedasticity

The error terms are homoskedastic if the conditional variance of each error term is a constant, σ2\sigma^2, and the conditional covariance between any two distinct error terms is zero.

E[ϵi2X]=σ2for all iE[\epsilon_i^2 | \mathbf{X}] = \sigma^2 \quad \text{for all } i
E[ϵiϵjX]=0for all ijE[\epsilon_i \epsilon_j | \mathbf{X}] = 0 \quad \text{for all } i \ne j

In matrix notation, this is expressed as:

E[ϵϵTX]=σ2InE[\bm{\epsilon}\bm{\epsilon}^T | \mathbf{X}] = \sigma^2 \mathbf{I}_n

where In\mathbf{I}_n is the n×nn \times n identity matrix.

1.2 The Violation: Heteroskedasticity

Heteroskedasticity (often called "hetero") means the homoskedasticity assumption is violated. While we maintain the assumption of no autocorrelation (E[ϵiϵjX]=0E[\epsilon_i \epsilon_j | \mathbf{X}] = 0), the variance of the error terms is no longer constant.

Condition: Heteroskedasticity

The error terms are heteroskedastic if the conditional variance of the error term ϵi\epsilon_i is not constant, but instead depends on ii (and therefore on the values of xi\mathbf{x}_i).

E[ϵi2X]=σi2E[\epsilon_i^2 | \mathbf{X}] = \sigma_i^2

In this case, the variance-covariance matrix of the error vector is a non-scalar diagonal matrix, denoted Ω\mathbf{\Omega}:

E[ϵϵTX]=Ω=[σ12000σ22000σn2]E[\bm{\epsilon}\bm{\epsilon}^T | \mathbf{X}] = \mathbf{\Omega} = \begin{bmatrix} \sigma_1^2 & 0 & \dots & 0 \\ 0 & \sigma_2^2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma_n^2 \end{bmatrix}

1.3 Mathematical Consequences for the OLS Estimator

Derivation: The True Variance of β̂_OLS

Let us derive the variance of the OLS estimator, β^=(XTX)1XTy\bm{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}, without assuming homoskedasticity.

Step 1: Express the estimation error.

β^=(XTX)1XT(Xβ+ϵ)=β+(XTX)1XTϵ\bm{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{X}\bm{\beta} + \bm{\epsilon}) = \bm{\beta} + (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\bm{\epsilon}
β^β=(XTX)1XTϵ\bm{\hat{\beta}} - \bm{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\bm{\epsilon}

Step 2: Use the definition of the variance-covariance matrix.

Var(β^X)=E[(β^β)(β^β)TX]\text{Var}(\bm{\hat{\beta}} | \mathbf{X}) = E[(\bm{\hat{\beta}} - \bm{\beta})(\bm{\hat{\beta}} - \bm{\beta})^T | \mathbf{X}]

Step 3: Substitute the estimation error.

Var(β^X)=E[((XTX)1XTϵ)((XTX)1XTϵ)TX]\text{Var}(\bm{\hat{\beta}} | \mathbf{X}) = E\left[ ((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\bm{\epsilon}) ((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\bm{\epsilon})^T | \mathbf{X} \right]

Using the transpose rule (AB)T=BTAT(\mathbf{AB})^T=\mathbf{B}^T\mathbf{A}^T, the second term becomes ϵTX(XTX)1\bm{\epsilon}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} (since (XTX)(\mathbf{X}^T\mathbf{X}) is symmetric).

Var(β^X)=E[(XTX)1XTϵϵTX(XTX)1X]\text{Var}(\bm{\hat{\beta}} | \mathbf{X}) = E\left[ (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \bm{\epsilon}\bm{\epsilon}^T \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} | \mathbf{X} \right]

Step 4: Apply the expectation operator. Since X\mathbf{X} is treated as fixed, we can move the expectation inside to the only random component, ϵϵT\bm{\epsilon}\bm{\epsilon}^T.

Var(β^X)=(XTX)1XTE[ϵϵTX]X(XTX)1\text{Var}(\bm{\hat{\beta}} | \mathbf{X}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T E[\bm{\epsilon}\bm{\epsilon}^T | \mathbf{X}] \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}

Step 5: Insert the general error covariance matrix Ω\mathbf{\Omega}.

True Variance of OLS

VTrue(β^)=(XTX)1XTΩX(XTX)1\mathbf{V}_{\text{True}}(\bm{\hat{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{\Omega} \mathbf{X} (\mathbf{X}^T\mathbf{X})^{-1}

This is the **true, general formula** for the variance of the OLS estimator.


The Failure Point: The standard OLS software formula is derived by incorrectly assuming homoskedasticity, i.e., by substituting Ω=σ2I\mathbf{\Omega} = \sigma^2\mathbf{I} into the true formula:

VOLS(β^)=(XTX)1XT(σ2I)X(XTX)1\mathbf{V}_{\text{OLS}}(\bm{\hat{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T (\sigma^2\mathbf{I}) \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}
=σ2(XTX)1(XTX)(XTX)1=σ2(XTX)1= \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1} (\mathbf{X}^T\mathbf{X}) (\mathbf{X}^T\mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}

When heteroskedasticity is present (Ωσ2I\mathbf{\Omega} \ne \sigma^2\mathbf{I}), the standard OLS variance formula σ2(XTX)1\sigma^2 (\mathbf{X}^T\mathbf{X})^{-1} is incorrect, biased, and inconsistent. Consequently, all standard errors, t-statistics, and F-statistics are invalid.

Part 2: Detection of Heteroskedasticity

The White Test for Heteroskedasticity

The White (1980) test provides a general method for detecting heteroskedasticity. The intuition is to test whether the variance of the errors is systematically related to the regressors.

Theoretical Basis: Since we cannot observe the true errors ϵi\epsilon_i, we use the squared OLS residuals, ei2e_i^2, as a consistent proxy for the true error variances, σi2\sigma_i^2. We then test if these proxies can be predicted by the original regressors.

Hypotheses:

  • H0:H_0: Homoskedasticity (The error variance is constant and unrelated to X\mathbf{X}).
  • H1:H_1: Heteroskedasticity (The error variance is a function of X\mathbf{X}).

Procedure:

  1. Run the primary regression y=Xβ+ϵ\mathbf{y} = \mathbf{X}\bm{\beta} + \bm{\epsilon} and obtain the residuals e\mathbf{e}.
  2. Construct the squared residuals, ei2e_i^2.
  3. Run an auxiliary regression of the squared residuals on a set of regressors Z\mathbf{Z} that includes the original regressors, their squares, and their cross-products.
    ei2=δ0+δ1xi1++δpzip+vie_i^2 = \delta_0 + \delta_1 x_{i1} + \dots + \delta_p z_{ip} + v_i
  4. Obtain the R2R^2 from this auxiliary regression.

The test statistic is a form of a Lagrange Multiplier (LM) test.

The White Test Statistic (LM Version)

Under the null hypothesis of homoskedasticity, the following statistic is asymptotically distributed as a Chi-squared random variable:

LM=nRaux2dχp2\text{LM} = n R^2_{\text{aux}} \xrightarrow{d} \chi^2_p

where pp is the number of regressors in the auxiliary regression (excluding the constant).

Decision Rule: If nRaux2n R^2_{\text{aux}} exceeds the critical value from the χp2\chi^2_p distribution for a chosen significance level α\alpha, we reject H0H_0 and conclude that heteroskedasticity is present.

Part 3: Correction: Heteroskedasticity-Consistent Standard Errors (HCSE)

Given that OLS coefficient estimates β^\bm{\hat{\beta}} remain unbiased and consistent under heteroskedasticity, the most common remedy is not to change the estimator, but to correct the formula for its variance.

We begin with the true variance formula derived in Part 1:

VTrue(β^)=(XTX)1XTΩX(XTX)1\mathbf{V}_{\text{True}}(\bm{\hat{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{\Omega} \mathbf{X} (\mathbf{X}^T\mathbf{X})^{-1}

The challenge is that Ω\mathbf{\Omega} is unknown. White (1980) proposed a consistent estimator for this matrix.

The White/Eicker/Huber HCSE Estimator

The key insight is to replace the unknown diagonal elements of Ω\mathbf{\Omega}, σi2\sigma_i^2, with their consistent sample counterparts, the squared OLS residuals, ei2e_i^2.

We form the estimator Ω^=diag(e12,e22,,en2)\mathbf{\hat{\Omega}} = \text{diag}(e_1^2, e_2^2, \dots, e_n^2), often denoted S\mathbf{S}.

Substituting this into the true variance formula gives the **Heteroskedasticity-Consistent Covariance Matrix Estimator (HCSE)**, also known as the "sandwich estimator":

VHCSE(β^)=(XTX)1XTΩ^X(XTX)1\mathbf{V}_{\text{HCSE}}(\bm{\hat{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{\hat{\Omega}} \mathbf{X} (\mathbf{X}^T\mathbf{X})^{-1}

Practical Implementations (HC0, HC1, HC2, HC3)

The basic White estimator (often called HC0) can be biased in small samples. Various small-sample corrections have been developed:

  • HC1 (Default in Stata): A simple degrees-of-freedom correction: nnk1\frac{n}{n-k-1}.
  • HC2, HC3: More complex corrections that adjust for the leverage of individual observations. HC3 is often recommended for smaller samples as it is more conservative.

The **Robust Standard Errors (\serobust\se_{\text{robust}})** reported by statistical software are the square roots of the diagonal elements of one of these estimated VHCSE\mathbf{V}_{\text{HCSE}} matrices.