Lesson 1.7: The Master Tool: Moment Generating Functions (MGFs)

This lesson introduces one of the most powerful tools in probability theory: the Moment Generating Function (MGF). We will treat the MGF as the unique 'fingerprint' of a distribution. More importantly, we will rigorously derive the MGFs for our key discrete distributions and use them as an elegant 'moment factory' to derive their mean, variance, and higher moments with calculus instead of cumbersome summations.

Part 1: The 'Why' and 'How' of MGFs

1.1 The Problem: Moments are Hard to Calculate

We've defined moments like mean and variance using summations, e.g., E[X]=xp(x)E[X] = \sum x \cdot p(x). This works for the first two moments, but what if we need the third moment (for skewness) or the fourth (for kurtosis)?

Calculating E[X4]=x4(nx)px(1p)nxE[X^4] = \sum x^4 \cdot \binom{n}{x}p^x(1-p)^{n-x} directly is an algebraic nightmare. We need a more elegant and powerful method. That method is the MGF.

The MGF: A Distribution's Fingerprint

Definition: Moment Generating Function (MGF)

The MGF of a random variable XX is defined as the expected value of etXe^{tX}:

MX(t)=E[etX]M_X(t) = E[e^{tX}]

For a discrete random variable, this is calculated as:

MX(t)=all xetxP(X=x)M_X(t) = \sum_{\text{all } x} e^{tx} \cdot P(X=x)

The Core Derivation: Why is it a 'Moment Generator'?

This is the most important proof of this lesson. We use the Taylor series expansion of eue^{u} around u=0u=0, where u=tXu=tX.

Step 1: Expand etXe^{tX} as a Taylor Series

e^{tX} = 1 + (tX) + rac{(tX)^2}{2!} + rac{(tX)^3}{3!} + dots = sum_{k=0}^{infty} rac{(tX)^k}{k!}

Step 2: Apply the Expectation Operator

By definition, MX(t)=E[etX]M_X(t) = E[e^{tX}]. Let's substitute the series expansion:

M_X(t) = Eleft[sum_{k=0}^{infty} rac{t^k X^k}{k!} ight]

Step 3: Use the Linearity of Expectation

We can bring the expectation inside the sum. The terms tkt^k and k!k! are constants with respect to the random variable XX, so they can be pulled out of the expectation.

M_X(t) = sum_{k=0}^{infty} rac{t^k}{k!} E[X^k]

Let's write out the first few terms to see the pattern:

M_X(t) = rac{t^0}{0!}E[X^0] + rac{t^1}{1!}E[X^1] + rac{t^2}{2!}E[X^2] + dots
M_X(t) = 1 + t E[X] + rac{t^2}{2}E[X^2] + dots

Step 4: Differentiate and Evaluate at t=0

Now, watch what happens when we differentiate with respect to tt and then set t=0t=0.

First Derivative:

M'_X(t) = E[X] + t E[X^2] + rac{t^2}{2}E[X^3] + dots
impliesMX(0)=E[X]implies M'_X(0) = E[X]

Second Derivative:

MX(t)=E[X2]+tE[X3]+dotsM''_X(t) = E[X^2] + t E[X^3] + dots
impliesMX(0)=E[X2]implies M''_X(0) = E[X^2]

The pattern holds! The k-th derivative evaluated at t=0 isolates the k-th moment.

Part 2: Detailed Derivations for Key Distributions

1. The Bernoulli(pp) Distribution (The Building Block)
A single trial with success probability pp. PMF: P(X=1)=p,P(X=0)=1pP(X=1)=p, P(X=0)=1-p.

Bernoulli MGF Derivation

MX(t)=E[etX]=sumetxP(X=x)M_X(t) = E[e^{tX}] = sum e^{tx}P(X=x)
=etcdot1P(X=1)+etcdot0P(X=0)= e^{t cdot 1}P(X=1) + e^{t cdot 0}P(X=0)
=etp+1(1p)= e^t p + 1(1-p)
MX(t)=pet+(1p)M_X(t) = pe^t + (1-p)

Bernoulli Moment Derivations

Mean:

MX(t)=petM'_X(t) = p e^t
E[X]=MX(0)=pe0=pE[X] = M'_X(0) = p e^0 = p

Variance:

MX(t)=petM''_X(t) = p e^t
E[X2]=MX(0)=pe0=pE[X^2] = M''_X(0) = p e^0 = p
extVar(X)=E[X2](E[X])2=pp2=p(1p) ext{Var}(X) = E[X^2] - (E[X])^2 = p - p^2 = p(1-p)
2. The Binomial(n,pn, p) Distribution (The Workhorse)

Binomial MGF Derivation

Let q=1pq = 1-p. PMF is P(X=x)=(nx)pxqnxP(X=x) = \binom{n}{x}p^x q^{n-x}.

M_X(t) = sum_{x=0}^n e^{tx} inom{n}{x} p^x q^{n-x}
= sum_{x=0}^n inom{n}{x} (pe^t)^x q^{n-x}

Using the Binomial Theorem (a+b)n=(nx)axbnx(a+b)^n = \sum \binom{n}{x} a^x b^{n-x} with a=peta=pe^t and b=qb=q:

MX(t)=(pet+q)n=(pet+1p)nM_X(t) = (pe^t + q)^n = (pe^t + 1-p)^n

Binomial Moment Derivations

Mean:

MX(t)=n(pet+q)n1cdot(pet)M'_X(t) = n(pe^t + q)^{n-1} cdot (pe^t)
E[X]=MX(0)=n(p+q)n1cdotp=n(1)n1p=npE[X] = M'_X(0) = n(p+q)^{n-1} cdot p = n(1)^{n-1}p = np

Variance: (Using Product Rule)

MX(t)=n(n1)(pet+q)n2(pet)2+n(pet+q)n1(pet)M''_X(t) = n(n-1)(pe^t+q)^{n-2}(pe^t)^2 + n(pe^t+q)^{n-1}(pe^t)
E[X2]=MX(0)=n(n1)(p+q)n2p2+n(p+q)n1pE[X^2] = M''_X(0) = n(n-1)(p+q)^{n-2}p^2 + n(p+q)^{n-1}p
=n(n1)p2+np= n(n-1)p^2 + np
extVar(X)=(n(n1)p2+np)(np)2 ext{Var}(X) = (n(n-1)p^2 + np) - (np)^2
=n2p2np2+npn2p2=npnp2=np(1p)= n^2p^2 - np^2 + np - n^2p^2 = np - np^2 = np(1-p)
3. The Poisson(λ\lambda) Distribution (The Rare Event Counter)

Poisson MGF Derivation

M_X(t) = sum_{x=0}^{infty} e^{tx} rac{e^{-lambda} lambda^x}{x!}
= e^{-lambda} sum_{x=0}^{infty} rac{(lambda e^t)^x}{x!}

Using the series for ez=zx/x!e^z = \sum z^x/x! with z=λetz = \lambda e^t:

=elambdaelambdaet=elambda(et1)= e^{-lambda} e^{lambda e^t} = e^{lambda(e^t - 1)}

Masterclass: All Four Poisson Moments

Let's derive all four central moments for the Poisson as a demonstration of the MGF's power.

1. Mean:

MX(t)=elambda(et1)cdotlambdaetM'_X(t) = e^{lambda(e^t - 1)} cdot lambda e^t
E[X]=MX(0)=e0cdotlambdae0=lambdaE[X] = M'_X(0) = e^0 cdot lambda e^0 = lambda

2. Variance:

MX(t)=(elambda(et1)lambdaet)lambdaet+elambda(et1)lambdaetM''_X(t) = (e^{lambda(e^t - 1)} lambda e^t)lambda e^t + e^{lambda(e^t - 1)}lambda e^t
E[X2]=MX(0)=(lambda)(lambda)+(lambda)=lambda2+lambdaE[X^2] = M''_X(0) = (lambda)(lambda) + (lambda) = lambda^2 + lambda
extVar(X)=E[X2](E[X])2=(lambda2+lambda)lambda2=lambda ext{Var}(X) = E[X^2] - (E[X])^2 = (lambda^2 + lambda) - lambda^2 = lambda

3. Skewness: (Requires 3rd derivative)

E[X3]=MX(0)=lambda3+3lambda2+lambdaE[X^3] = M'''_X(0) = lambda^3 + 3lambda^2 + lambda

The standardized skewness is E[(Xμσ)3]=E[X3]3μE[X2]+2μ3σ3E[(\frac{X-\mu}{\sigma})^3] = \frac{E[X^3] - 3\mu E[X^2] + 2\mu^3}{\sigma^3}.

For the Poisson, this simplifies to Skewness=1/λ\text{Skewness} = 1/\sqrt{\lambda}.

4. Kurtosis: (Requires 4th derivative)

E[X4]=MX(0)=lambda4+6lambda3+7lambda2+lambdaE[X^4] = M''''_X(0) = lambda^4 + 6lambda^3 + 7lambda^2 + lambda

The excess kurtosis simplifies to Kurtosis=1/λ\text{Kurtosis} = 1/\lambda.

4. The Geometric(pp) Distribution (The Waiting Game)

Geometric MGF Derivation

Let q=1pq = 1-p. PMF is P(X=x)=qx1pP(X=x) = q^{x-1}p for x=1,2,x=1,2,\dots

MX(t)=sumx=1inftyetxqx1p=psumx=1inftyetxqx1M_X(t) = sum_{x=1}^{infty} e^{tx} q^{x-1}p = p sum_{x=1}^{infty} e^{tx} q^{x-1}
=petsumx=1inftyet(x1)qx1=petsumk=0infty(qet)k= pe^t sum_{x=1}^{infty} e^{t(x-1)} q^{x-1} = pe^t sum_{k=0}^{infty} (qe^t)^k

Using the geometric series sum rk=1/(1r)\sum r^k = 1/(1-r) with r=qetr=qe^t:

M_X(t) = rac{pe^t}{1 - qe^t} = rac{pe^t}{1 - (1-p)e^t}

Geometric Moment Derivations

Mean: (Using Quotient Rule)

M'_X(t) = rac{pe^t(1-qe^t) - pe^t(-qe^t)}{(1-qe^t)^2} = rac{pe^t}{(1-qe^t)^2}
E[X] = M'_X(0) = rac{p}{(1-q)^2} = rac{p}{p^2} = 1/p

Variance: The second derivative is complex, but evaluating at t=0 gives:

E[X^2] = M''_X(0) = rac{p(1+q)}{(1-q)^3} = rac{2-p}{p^2}
ext{Var}(X) = rac{2-p}{p^2} - (1/p)^2 = rac{2-p-1}{p^2} = rac{1-p}{p^2}
Higher Moments Summary

    While we derived the first two moments in detail for all distributions, the standardized higher moments are also important characteristics:

    DistributionSkewnessExcess Kurtosis
    Binomial(n,pn,p)12pnp(1p)\frac{1-2p}{\sqrt{np(1-p)}}16p(1p)np(1p)\frac{1-6p(1-p)}{np(1-p)}
    Poisson(λ\lambda)1/λ1/\sqrt{\lambda}1/λ1/\lambda
    Geometric(pp)2p1p\frac{2-p}{\sqrt{1-p}}6+p21p6 + \frac{p^2}{1-p}

What's Next? The Continuous World

We have now rigorously mastered the mathematical machinery for discrete random variables. We have a toolbox of distributions and a master tool (the MGF) for analyzing them.

It is time to cross the bridge into the continuous world. We must replace our summation tool (\sum) with the tool of integration (\int) and learn about Probability Density Functions (PDFs).