Lesson 2.7: The F-Distribution (Fisher-Snedecor)
We now meet the final, and most general, of the sampling distributions. The F-distribution is the ultimate tool for comparing sources of variance. We'll derive it as a ratio of two Chi-Squared distributions, which makes it the perfect instrument for testing the overall significance of a model or the joint significance of a group of variables.
Part 1: The Signal-to-Noise Ratio
The t-test is perfect for judging a single variable. But what if we want to answer a bigger question, like "Is my entire model useful?" or "Does adding this group of 3 new features significantly improve my predictions?"
To answer this, we need to compare how much variance our new features *explain* (the signal) against the variance they *don't* explain (the noise). The F-distribution is the tool that governs this comparison.
The Core Idea: The F-distribution is the distribution of a ratio of two independent Chi-Squared variables. Think of it as a "signal-to-noise" distribution.
Definition: The F-Distribution
Let and be two independent Chi-Squared random variables with and degrees of freedom, respectively.
The random variable defined below follows an F-distribution with degrees of freedom:
Part 2: Properties of the F-Distribution
The shape of the F-distribution is uniquely determined by two different degrees of freedom parameters:
- : The **numerator degrees of freedom** (related to the "signal").
- : The **denominator degrees of freedom** (related to the "noise").
Imagine a plot showing several right-skewed F-distribution curves, labeled with their (k1, k2) df pairs, like F(3, 20) or F(5, 50).
Like the Chi-Squared, the F-distribution is always non-negative and skewed to the right.
The F-distribution elegantly connects to the t-distribution.
This shows that a t-test on a single coefficient is just a special case of an F-test where you are testing only one restriction ().
Part 3: The Connection to Regression Models
The F-statistic you see in every regression output is a direct application of this definition.
Deriving the OLS F-Statistic
When we test if a group of variables is jointly significant, we are comparing a restricted model (R) to an unrestricted model (UR).
Step 1: Define the Signal and Noise.
- Signal (U): The reduction in squared errors from adding the variables. We know . So and .
- Noise (V): The remaining squared errors in the full model. We know . So and (where p is total predictors).
Step 2: Construct the F-ratio.
Step 3: The terms cancel. This is the magic! We don't need to know the true error variance to calculate the statistic.
This final formula is exactly the F-statistic used for joint hypothesis testing.
- Overall Model Significance: The F-statistic reported at the top of every regression output tests the null hypothesis that all slope coefficients are jointly zero (). It answers the question, "Is this model, as a whole, better than just predicting the mean?"
- Feature Selection: In machine learning, the F-test is used to decide if adding a group of new features (e.g., adding polynomial terms or interaction effects) provides a statistically significant improvement in model fit, helping to prevent overfitting.
- Testing Economic Theories: In finance, the F-test is used to test complex hypotheses, such as the Capital Asset Pricing Model (CAPM) by checking if a group of "alpha" factors are jointly equal to zero.
The F-test is the primary tool for making decisions about the structure of a model.
What's Next? The Magic of Large Numbers
Congratulations! You have now met the entire family of sampling distributions (χ², t, and F) that form the foundation of classical statistical inference.
But all of these rely on a strong, often-violated assumption: that our data comes from a Normal distribution. What happens in the real world when our data is skewed or has weird properties? Can we still do statistics?
The answer is a resounding YES, thanks to the magic of **Asymptotic Theory**. The next part of our module introduces the two most powerful theorems in all of statistics: the Law of Large Numbers and the Central Limit Theorem.