Lesson 1.7: The Master Tool: Moment Generating Functions (MGFs)

This lesson introduces one of the most powerful tools in probability theory: the Moment Generating Function (MGF). We will treat the MGF as the unique 'fingerprint' of a distribution. More importantly, we will rigorously derive the MGFs for our key discrete distributions and use them as an elegant 'moment factory' to derive their mean, variance, and higher moments with calculus instead of cumbersome summations.

Part 1: The 'Why' and 'How' of MGFs

1.1 The Problem: Moments are Hard to Calculate

We've defined moments like mean and variance using summations, e.g., $E[X] = \sum x \cdot p(x)$ . This works for the first two moments, but what if we need the third moment (for skewness) or the fourth (for kurtosis)?

Calculating $E[X^4] = \sum x^4 \cdot \binom{n}{x}p^x(1-p)^{n-x}$ directly is an algebraic nightmare. We need a more elegant and powerful method. That method is the MGF.

The MGF: A Distribution's Fingerprint

Definition: Moment Generating Function (MGF)

The MGF of a random variable $X$ is defined as the expected value of $e^{tX}$ :

M_X(t) = E[e^{tX}]

For a discrete random variable, this is calculated as:

M_X(t) = \sum_{\text{all } x} e^{tx} \cdot P(X=x)

The Core Derivation: Why is it a 'Moment Generator'?

This is the most important proof of this lesson. We use the Taylor series expansion of $e^{u}$ around $u=0$ , where $u=tX$ .

Step 1: Expand $e^{tX}$ as a Taylor Series

e^{tX} = 1 + (tX) + rac{(tX)^2}{2!} + rac{(tX)^3}{3!} + dots = sum_{k=0}^{infty} rac{(tX)^k}{k!}

Step 2: Apply the Expectation Operator

By definition, $M_X(t) = E[e^{tX}]$ . Let's substitute the series expansion:

M_X(t) = Eleft[sum_{k=0}^{infty} rac{t^k X^k}{k!} ight]

Step 3: Use the Linearity of Expectation

We can bring the expectation inside the sum. The terms $t^k$ and $k!$ are constants with respect to the random variable $X$ , so they can be pulled out of the expectation.

M_X(t) = sum_{k=0}^{infty} rac{t^k}{k!} E[X^k]

Let's write out the first few terms to see the pattern:

M_X(t) = rac{t^0}{0!}E[X^0] + rac{t^1}{1!}E[X^1] + rac{t^2}{2!}E[X^2] + dots

M_X(t) = 1 + t E[X] + rac{t^2}{2}E[X^2] + dots

Step 4: Differentiate and Evaluate at t=0

Now, watch what happens when we differentiate with respect to $t$ and then set $t=0$ .

First Derivative:

M'_X(t) = E[X] + t E[X^2] + rac{t^2}{2}E[X^3] + dots

implies M'_X(0) = E[X]

Second Derivative:

M''_X(t) = E[X^2] + t E[X^3] + dots

implies M''_X(0) = E[X^2]

The pattern holds! The k-th derivative evaluated at t=0 isolates the k-th moment.

Part 2: Detailed Derivations for Key Distributions

1. The Bernoulli(

p

) Distribution (The Building Block)

A single trial with success probability

p

. PMF:

P(X=1)=p, P(X=0)=1-p

Bernoulli MGF Derivation

M_X(t) = E[e^{tX}] = sum e^{tx}P(X=x)

= e^{t cdot 1}P(X=1) + e^{t cdot 0}P(X=0)

= e^t p + 1(1-p)

M_X(t) = pe^t + (1-p)

Bernoulli Moment Derivations

Mean:

M'_X(t) = p e^t

E[X] = M'_X(0) = p e^0 = p

Variance:

M''_X(t) = p e^t

E[X^2] = M''_X(0) = p e^0 = p

ext{Var}(X) = E[X^2] - (E[X])^2 = p - p^2 = p(1-p)

2. The Binomial(

n, p

) Distribution (The Workhorse)

Binomial MGF Derivation

Let $q = 1-p$ . PMF is $P(X=x) = \binom{n}{x}p^x q^{n-x}$ .

M_X(t) = sum_{x=0}^n e^{tx} inom{n}{x} p^x q^{n-x}

= sum_{x=0}^n inom{n}{x} (pe^t)^x q^{n-x}

Using the Binomial Theorem $(a+b)^n = \sum \binom{n}{x} a^x b^{n-x}$ with $a=pe^t$ and $b=q$ :

M_X(t) = (pe^t + q)^n = (pe^t + 1-p)^n

Binomial Moment Derivations

Mean:

M'_X(t) = n(pe^t + q)^{n-1} cdot (pe^t)

E[X] = M'_X(0) = n(p+q)^{n-1} cdot p = n(1)^{n-1}p = np

Variance: (Using Product Rule)

M''_X(t) = n(n-1)(pe^t+q)^{n-2}(pe^t)^2 + n(pe^t+q)^{n-1}(pe^t)

E[X^2] = M''_X(0) = n(n-1)(p+q)^{n-2}p^2 + n(p+q)^{n-1}p

= n(n-1)p^2 + np

ext{Var}(X) = (n(n-1)p^2 + np) - (np)^2

= n^2p^2 - np^2 + np - n^2p^2 = np - np^2 = np(1-p)

3. The Poisson(

\lambda

) Distribution (The Rare Event Counter)

Poisson MGF Derivation

M_X(t) = sum_{x=0}^{infty} e^{tx} rac{e^{-lambda} lambda^x}{x!}

= e^{-lambda} sum_{x=0}^{infty} rac{(lambda e^t)^x}{x!}

Using the series for $e^z = \sum z^x/x!$ with $z = \lambda e^t$ :

= e^{-lambda} e^{lambda e^t} = e^{lambda(e^t - 1)}

Masterclass: All Four Poisson Moments

Let's derive all four central moments for the Poisson as a demonstration of the MGF's power.

1. Mean:

M'_X(t) = e^{lambda(e^t - 1)} cdot lambda e^t

E[X] = M'_X(0) = e^0 cdot lambda e^0 = lambda

2. Variance:

M''_X(t) = (e^{lambda(e^t - 1)} lambda e^t)lambda e^t + e^{lambda(e^t - 1)}lambda e^t

E[X^2] = M''_X(0) = (lambda)(lambda) + (lambda) = lambda^2 + lambda

ext{Var}(X) = E[X^2] - (E[X])^2 = (lambda^2 + lambda) - lambda^2 = lambda

3. Skewness: (Requires 3rd derivative)

E[X^3] = M'''_X(0) = lambda^3 + 3lambda^2 + lambda

The standardized skewness is $E[(\frac{X-\mu}{\sigma})^3] = \frac{E[X^3] - 3\mu E[X^2] + 2\mu^3}{\sigma^3}$ .

For the Poisson, this simplifies to $\text{Skewness} = 1/\sqrt{\lambda}$ .

4. Kurtosis: (Requires 4th derivative)

E[X^4] = M''''_X(0) = lambda^4 + 6lambda^3 + 7lambda^2 + lambda

The excess kurtosis simplifies to $\text{Kurtosis} = 1/\lambda$ .

4. The Geometric(

p

) Distribution (The Waiting Game)

Geometric MGF Derivation

Let $q = 1-p$ . PMF is $P(X=x) = q^{x-1}p$ for $x=1,2,\dots$

M_X(t) = sum_{x=1}^{infty} e^{tx} q^{x-1}p = p sum_{x=1}^{infty} e^{tx} q^{x-1}

= pe^t sum_{x=1}^{infty} e^{t(x-1)} q^{x-1} = pe^t sum_{k=0}^{infty} (qe^t)^k

Using the geometric series sum $\sum r^k = 1/(1-r)$ with $r=qe^t$ :

M_X(t) = rac{pe^t}{1 - qe^t} = rac{pe^t}{1 - (1-p)e^t}

Geometric Moment Derivations

Mean: (Using Quotient Rule)

M'_X(t) = rac{pe^t(1-qe^t) - pe^t(-qe^t)}{(1-qe^t)^2} = rac{pe^t}{(1-qe^t)^2}

E[X] = M'_X(0) = rac{p}{(1-q)^2} = rac{p}{p^2} = 1/p

Variance: The second derivative is complex, but evaluating at t=0 gives:

E[X^2] = M''_X(0) = rac{p(1+q)}{(1-q)^3} = rac{2-p}{p^2}

ext{Var}(X) = rac{2-p}{p^2} - (1/p)^2 = rac{2-p-1}{p^2} = rac{1-p}{p^2}

Higher Moments Summary

While we derived the first two moments in detail for all distributions, the standardized higher moments are also important characteristics:

Distribution	Skewness	Excess Kurtosis
Binomial( $n,p$ )	$\frac{1-2p}{\sqrt{np(1-p)}}$	$\frac{1-6p(1-p)}{np(1-p)}$
Poisson( $\lambda$ )	$1/\sqrt{\lambda}$	$1/\lambda$
Geometric( $p$ )	$\frac{2-p}{\sqrt{1-p}}$	$6 + \frac{p^2}{1-p}$

What's Next? The Continuous World

We have now rigorously mastered the mathematical machinery for discrete random variables. We have a toolbox of distributions and a master tool (the MGF) for analyzing them.

It is time to cross the bridge into the continuous world. We must replace our summation tool ( $\sum$ ) with the tool of integration ( $\int$ ) and learn about Probability Density Functions (PDFs).

Lesson 1.6: The Quant's Toolbox: Common Discrete Distributions

Lesson 1.8: The Continuous World: PDFs and Smooth CDFs