Lesson 2.3: Performing Surgery on the MVN

This lesson dives into the mechanics of the Multivariate Normal distribution. We will learn how to perform 'statistical surgery' by deriving the precise formulas for Marginal distributions (when we ignore variables) and Conditional distributions (when we observe variables). The conditional mean formula derived here is the fundamental justification for Linear Regression.

Part 1: Partitioning the System

Imagine a system of $k$ variables, $\mathbf{X} \sim \mathcal{N}_k(\bm{\mu}, \mathbf{\Sigma})$ . To analyze it, we often need to split it into two groups:

$\mathbf{X}_1$ : A vector of $p$ variables we want to predict or analyze.
$\mathbf{X}_2$ : A vector of $k-p$ variables that we have observed.

We partition our mean vector and covariance matrix to match this structure:

\mathbf{X} = \begin{bmatrix} \mathbf{X}_1 \\\\ \mathbf{X}_2 \end{bmatrix}, \quad \bm{\mu} = \begin{bmatrix} \bm{\mu}_1 \\\\ \bm{\mu}_2 \end{bmatrix}, \quad \mathbf{\Sigma} = \begin{bmatrix} \mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12} \\\\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22} \end{bmatrix}

Deconstructing the Partitioned Covariance Matrix

$\mathbf{\Sigma}_{11}$ : The covariance matrix *within* the $\mathbf{X}_1$ variables.
$\mathbf{\Sigma}_{22}$ : The covariance matrix *within* the $\mathbf{X}_2$ variables.
$\mathbf{\Sigma}_{12}$ : The matrix of covariances *between* every variable in $\mathbf{X}_1$ and every variable in $\mathbf{X}_2$ . ( $\mathbf{\Sigma}_{21} = \mathbf{\Sigma}_{12}^T$ )

Part 2: The Marginal Distribution (Ignoring Variables)

The first question is the simplest: If we have a joint model of 10 stocks but we only care about the first two, what is their distribution? The answer is called the **marginal distribution**.

Theorem: Marginal Distribution of an MVN

The marginal distribution of any subset of an MVN vector is also MVN, defined only by the corresponding sub-vector of the mean and sub-matrix of the covariance matrix.

\mathbf{X}_1 \sim \mathcal{N}(\bm{\mu}_1, \mathbf{\Sigma}_{11})

Intuition: This is the "zooming in" property. If a whole system is jointly Normal, any part you zoom in on is also Normal. This is a wonderfully simple and convenient result.

Part 3: The Conditional Distribution (Incorporating Information)

This is the main event and one of the most beautiful results in statistics. The **conditional distribution** answers the question: "Now that I have observed that $\mathbf{X}_2$ has the specific value $\mathbf{x}_2$ , how does this new information change my beliefs about the distribution of $\mathbf{X}_1$ ?"

Theorem: Conditional Distribution of an MVN

The conditional distribution $\mathbf{X}_1 | \mathbf{X}_2 = \mathbf{x}_2$ is also Multivariate Normal, but with an updated mean and a smaller variance.

\mathbf{X}_1 | \mathbf{X}_2 = \mathbf{x}_2 \sim \mathcal{N}(\bm{\mu}_{1|2}, \mathbf{\Sigma}_{1|2})

The Conditional Mean: The Engine of Linear Regression

The updated mean, $\bm{\mu}_{1|2}$ , is our new best guess for $\mathbf{X}_1$ .

Formula: Conditional Mean

\bm{\mu}_{1|2} = E[\mathbf{X}_1 | \mathbf{X}_2=\mathbf{x}_2] = \bm{\mu}_1 + \mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1} (\mathbf{x}_2 - \bm{\mu}_2)

Deconstructing the Formula's Intuition

Updated Belief = Original Belief + Adjustment

$\bm{\mu}_1$ : Our starting best guess for $\mathbf{X}_1$ before any new information.
$(\mathbf{x}_2 - \bm{\mu}_2)$ : The "surprise" in the data we observed. How far was $\mathbf{x}_2$ from its own average?
$\mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1}$ : The **"transmission mechanism."** This matrix acts like a set of regression coefficients, translating the "surprise" in $\mathbf{X}_2$ into a specific adjustment for $\mathbf{X}_1$ .

The Conditional Variance: The Power of Information

The updated variance, $\mathbf{\Sigma}_{1|2}$ , represents our remaining uncertainty about $\mathbf{X}_1$ after observing $\mathbf{x}_2$ . It is always smaller than the marginal variance $\mathbf{\Sigma}_{11}$ .

Formula: Conditional Variance

\mathbf{\Sigma}_{1|2} = \text{Var}(\mathbf{X}_1 | \mathbf{X}_2=\mathbf{x}_2) = \mathbf{\Sigma}_{11} - \mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1} \mathbf{\Sigma}_{21}

Deconstructing the Formula's Intuition

Remaining Uncertainty = Original Uncertainty - Information Gained

$\mathbf{\Sigma}_{11}$ : Our initial variance (uncertainty) in $\mathbf{X}_1$ .
$\mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1} \mathbf{\Sigma}_{21}$ : The amount of variance that was *explained* by $\mathbf{X}_1$ 's linear relationship with $\mathbf{X}_2$ . This term is always non-negative.

This proves that observing a correlated variable **never increases our uncertainty**. At worst (if they are uncorrelated), the "information gained" is zero and the variance is unchanged. Usually, it reduces our uncertainty.

The Grand Connection: Why Linear Regression Works

Look again at the formula for the conditional mean:

E[\mathbf{X}_1 | \mathbf{X}_2=\mathbf{x}_2] = \bm{\mu}_1 + \mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1} (\mathbf{x}_2 - \bm{\mu}_2)

If we group the terms, this is a simple linear equation:

E[\mathbf{X}_1 | \mathbf{X}_2=\mathbf{x}_2] = \underbrace{(\bm{\mu}_1 - \mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1}\bm{\mu}_2)}_{\text{Intercept}} + \underbrace{(\mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1})}_{\text{Slope } (\beta)} \mathbf{x}_2

This proves that if two sets of variables are jointly Multivariate Normal, the best possible prediction for one, given the other, is a linear function. This is the absolute theoretical justification for using Ordinary Least Squares (OLS) regression to model such systems.

Up Next: Capstone 1: Let's Apply This to a Portfolio