Lesson 2.3: Performing Surgery on the MVN

This lesson dives into the mechanics of the Multivariate Normal distribution. We will learn how to perform 'statistical surgery' by deriving the precise formulas for Marginal distributions (when we ignore variables) and Conditional distributions (when we observe variables). The conditional mean formula derived here is the fundamental justification for Linear Regression.

Part 1: Partitioning the System

Imagine a system of kk variables, XNk(μ,Σ)\mathbf{X} \sim \mathcal{N}_k(\bm{\mu}, \mathbf{\Sigma}). To analyze it, we often need to split it into two groups:

  • X1\mathbf{X}_1: A vector of pp variables we want to predict or analyze.
  • X2\mathbf{X}_2: A vector of kpk-p variables that we have observed.

We partition our mean vector and covariance matrix to match this structure:

X=[X1X2],μ=[μ1μ2],Σ=[Σ11Σ12Σ21Σ22]\mathbf{X} = \begin{bmatrix} \mathbf{X}_1 \\\\ \mathbf{X}_2 \end{bmatrix}, \quad \bm{\mu} = \begin{bmatrix} \bm{\mu}_1 \\\\ \bm{\mu}_2 \end{bmatrix}, \quad \mathbf{\Sigma} = \begin{bmatrix} \mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12} \\\\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22} \end{bmatrix}

Deconstructing the Partitioned Covariance Matrix

  • Σ11\mathbf{\Sigma}_{11}: The covariance matrix *within* the X1\mathbf{X}_1 variables.
  • Σ22\mathbf{\Sigma}_{22}: The covariance matrix *within* the X2\mathbf{X}_2 variables.
  • Σ12\mathbf{\Sigma}_{12}: The matrix of covariances *between* every variable in X1\mathbf{X}_1 and every variable in X2\mathbf{X}_2. (Σ21=Σ12T\mathbf{\Sigma}_{21} = \mathbf{\Sigma}_{12}^T)

Part 2: The Marginal Distribution (Ignoring Variables)

The first question is the simplest: If we have a joint model of 10 stocks but we only care about the first two, what is their distribution? The answer is called the **marginal distribution**.

Theorem: Marginal Distribution of an MVN

The marginal distribution of any subset of an MVN vector is also MVN, defined only by the corresponding sub-vector of the mean and sub-matrix of the covariance matrix.

X1N(μ1,Σ11)\mathbf{X}_1 \sim \mathcal{N}(\bm{\mu}_1, \mathbf{\Sigma}_{11})

Intuition: This is the "zooming in" property. If a whole system is jointly Normal, any part you zoom in on is also Normal. This is a wonderfully simple and convenient result.

Part 3: The Conditional Distribution (Incorporating Information)

This is the main event and one of the most beautiful results in statistics. The **conditional distribution** answers the question: "Now that I have observed that X2\mathbf{X}_2 has the specific value x2\mathbf{x}_2, how does this new information change my beliefs about the distribution of X1\mathbf{X}_1?"

Theorem: Conditional Distribution of an MVN

The conditional distribution X1X2=x2\mathbf{X}_1 | \mathbf{X}_2 = \mathbf{x}_2 is also Multivariate Normal, but with an updated mean and a smaller variance.

X1X2=x2N(μ12,Σ12)\mathbf{X}_1 | \mathbf{X}_2 = \mathbf{x}_2 \sim \mathcal{N}(\bm{\mu}_{1|2}, \mathbf{\Sigma}_{1|2})
The Conditional Mean: The Engine of Linear Regression

The updated mean, μ12\bm{\mu}_{1|2}, is our new best guess for X1\mathbf{X}_1.

Formula: Conditional Mean

μ12=E[X1X2=x2]=μ1+Σ12Σ221(x2μ2)\bm{\mu}_{1|2} = E[\mathbf{X}_1 | \mathbf{X}_2=\mathbf{x}_2] = \bm{\mu}_1 + \mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1} (\mathbf{x}_2 - \bm{\mu}_2)

Deconstructing the Formula's Intuition

Updated Belief = Original Belief + Adjustment

  • μ1\bm{\mu}_1: Our starting best guess for X1\mathbf{X}_1 before any new information.
  • (x2μ2)(\mathbf{x}_2 - \bm{\mu}_2): The "surprise" in the data we observed. How far was x2\mathbf{x}_2 from its own average?
  • Σ12Σ221\mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1}: The **"transmission mechanism."** This matrix acts like a set of regression coefficients, translating the "surprise" in X2\mathbf{X}_2 into a specific adjustment for X1\mathbf{X}_1.
The Conditional Variance: The Power of Information

The updated variance, Σ12\mathbf{\Sigma}_{1|2}, represents our remaining uncertainty about X1\mathbf{X}_1 after observing x2\mathbf{x}_2. It is always smaller than the marginal variance Σ11\mathbf{\Sigma}_{11}.

Formula: Conditional Variance

Σ12=Var(X1X2=x2)=Σ11Σ12Σ221Σ21\mathbf{\Sigma}_{1|2} = \text{Var}(\mathbf{X}_1 | \mathbf{X}_2=\mathbf{x}_2) = \mathbf{\Sigma}_{11} - \mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1} \mathbf{\Sigma}_{21}

Deconstructing the Formula's Intuition

Remaining Uncertainty = Original Uncertainty - Information Gained

  • Σ11\mathbf{\Sigma}_{11}: Our initial variance (uncertainty) in X1\mathbf{X}_1.
  • Σ12Σ221Σ21\mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1} \mathbf{\Sigma}_{21}: The amount of variance that was *explained* by X1\mathbf{X}_1's linear relationship with X2\mathbf{X}_2. This term is always non-negative.

This proves that observing a correlated variable **never increases our uncertainty**. At worst (if they are uncorrelated), the "information gained" is zero and the variance is unchanged. Usually, it reduces our uncertainty.

The Grand Connection: Why Linear Regression Works

Look again at the formula for the conditional mean:

E[X1X2=x2]=μ1+Σ12Σ221(x2μ2)E[\mathbf{X}_1 | \mathbf{X}_2=\mathbf{x}_2] = \bm{\mu}_1 + \mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1} (\mathbf{x}_2 - \bm{\mu}_2)

If we group the terms, this is a simple linear equation:

E[X1X2=x2]=(μ1Σ12Σ221μ2)Intercept+(Σ12Σ221)Slope (β)x2E[\mathbf{X}_1 | \mathbf{X}_2=\mathbf{x}_2] = \underbrace{(\bm{\mu}_1 - \mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1}\bm{\mu}_2)}_{\text{Intercept}} + \underbrace{(\mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1})}_{\text{Slope } (\beta)} \mathbf{x}_2

This proves that if two sets of variables are jointly Multivariate Normal, the best possible prediction for one, given the other, is a linear function. This is the absolute theoretical justification for using Ordinary Least Squares (OLS) regression to model such systems.

Up Next: Capstone 1: Let's Apply This to a Portfolio