Lesson 2.6: The t-Distribution (Student's t)
We now introduce the workhorse of all statistical inference. The t-distribution is what we use when we want to test hypotheses but don't know the true population variance (which is always). We'll derive its structure from the Normal and Chi-Squared distributions and understand why its 'fatter tails' are the key to honest statistical testing with real-world data.
Part 1: The Problem with the Real World
In a perfect theoretical world, we could test a hypothesis about a sample mean using the Z-statistic we know and love:
But this formula has a fatal flaw for practical use: it requires , the true population standard deviation. In 99.9% of real-world scenarios, from analyzing stock returns to medical trials, is unknown.
The Core Idea: The t-distribution is what you get when you build a Z-statistic but are forced to use the sample standard deviation () as a plug-in estimate for the true population standard deviation ().
This simple substitution changes everything. The new statistic no longer follows a perfect Normal distribution.
Definition: The Student's t-statistic
This new statistic follows a t-distribution with degrees of freedom.
Part 2: Understanding the t-Distribution
Imagine a plot showing a standard Normal curve (Z) in blue. Overlaid in red is a t-distribution curve with low df (e.g., 5 df), which is slightly shorter at the peak and visibly fatter in both tails.
The 'Uncertainty Tax': Fatter Tails
Using an estimate instead of the true value introduces extra uncertainty into our calculation. The t-distribution accounts for this by having **fatter tails** than the Normal distribution.
Think of this as an "uncertainty tax": for the convenience of using an estimate, we have to be more conservative. The fatter tails mean that more extreme values are more likely, so we'll need stronger evidence (a larger t-statistic) to reject a null hypothesis.
The "fatness" of the tails is controlled by the degrees of freedom ().
- Low df (small sample): Our estimate is unreliable. The uncertainty tax is high, and the tails are very fat.
- High df (large sample): Our estimate becomes very accurate. The uncertainty tax shrinks, and the t-distribution converges to become identical to the standard Normal Z-distribution. (Generally, for , they are practically the same).
Part 3: The Formal Derivation
The t-distribution is a beautiful composite of the two distributions we just learned.
Formal Definition: The t-Distribution
If and are independent, then the variable defined below follows a t-distribution with degrees of freedom:
Proof: How this definition creates our t-statistic
We need to show that fits this structure.
Step 1: Identify the Z component (the numerator).
Step 2: Identify the V component (related to the denominator). From the previous lesson, we know:
Here, the degrees of freedom .
Step 3: Construct the ratio .
Step 4: Assemble the final T statistic.
The unknown terms cancel out perfectly, leaving:
This proves that our practical statistic has the exact structure of a formal t-distribution.
- Econometrics & Finance: Every time you look at a regression output, the t-statistic and p-value for each coefficient () are calculated using the t-distribution. It tells you if a factor is a statistically significant predictor of the outcome.
- A/B Testing: When comparing the means of two groups (e.g., click-through rate of website version A vs. version B), the two-sample t-test is the standard method for determining if the observed difference is real or just due to random chance.
The t-test is arguably the most widely used statistical test in the world.
What's Next? Testing the Whole Model
The t-test is perfect for checking the significance of a *single* variable. But how do we test the significance of our *entire regression model* at once? How do we test if a *group* of variables is jointly significant?
For that, we need a new tool. We need to compare the variance explained by our model to the residual variance. This requires taking a ratio of two Chi-Squared variables, which leads us to the final member of the sampling family: the F-Distribution.