Constructing Hypothesis Tests: t, χ², and F
Constructing Hypothesis Tests
Four tests, one framework. Learn which test to reach for — and how to run it — based on what you're testing.
One question determines everything
Before you touch a formula, ask: what am I testing? The answer funnels you into exactly one of four tests. The entire LOS reduces to this decision tree:
| What are you testing? | Test | Distribution |
|---|---|---|
| Value of a single population mean | t-test (or z if n large) | t or z |
| Equality of two means — independent samples | t-test (pooled variance) | t |
| Equality of two means — dependent samples | Paired comparisons t-test | t |
| Value of a single population variance | Chi-square (χ²) test | χ² |
| Equality of two population variances | F-test | F |
The first three are about means and all use some form of t-statistic. The last two are about variances and use distributions that look different — chi-square is right-skewed and bounded at zero, while the F-distribution is a ratio of two chi-squares. Let's build each one.
Testing a single population mean
This is the simplest case. You have one sample and want to test whether the population mean equals some hypothesized value. The logic is universal to all hypothesis tests: compute how far your sample result is from the hypothesized value, measured in standard error units.
The numerator is the observed gap. The denominator is the noise level. When this ratio is extreme enough to land in the rejection region, you reject H₀.
A researcher measures 250 daily returns on an options portfolio. Mean return: 0.1%. Standard deviation: 0.25%. Is the mean return different from zero?
Difference between two means (independent samples)
Now you have two separate samples and want to test whether their population means are equal. The key word is independent — the samples don't influence each other. This test uses a pooled variance when the population variances are assumed equal.
The intuition is clean: if the two sample means are far apart relative to the pooled noise level, the t-statistic is large and you reject equality. The pooled variance is just a weighted average of the two sample variances, weighted by their respective degrees of freedom.
The difference-in-means test requires three assumptions: independent samples, normally distributed populations, and equal population variances. If the samples are dependent, you need the paired comparisons test instead.
Paired comparisons (dependent samples)
When samples are linked — both affected by the same external factor — you can't use the difference-in-means test. Instead, you compute the difference within each pair, then test whether the mean of those differences is zero. This is really just a single-mean t-test applied to the differences.
The classic example: comparing betas of the same companies before and after an event. The returns of both periods are influenced by shared market conditions, so the samples are dependent. By working with the differences within each pair, you control for that shared influence.
If the two samples can be drawn from completely different pools (e.g., textile firms vs. paper firms), they're independent → use the difference-in-means test. If each observation in one sample has a natural partner in the other (e.g., the same firm's beta before vs. after), they're dependent → use the paired test.
Testing a single population variance (χ² test)
Now we shift from means to variances. The chi-square test checks whether a population variance equals a specific value. The distribution is asymmetric — it's right-skewed and bounded at zero, because variances can't be negative.
For a two-tailed test at α = 0.05 with 30 degrees of freedom, the critical values are 16.791 (lower) and 46.979 (upper). Notice they're not symmetric around a center — that's because the chi-square distribution itself is asymmetric. The lower critical value comes from the 0.975 column of the chi-square table (97.5% of probability to its right), and the upper from the 0.025 column.
Chi-square table columns show probability in the right tail. For a two-tailed test at 5% significance: the lower critical value uses the 0.975 column (2.5% in the left tail = 97.5% to the right), and the upper critical value uses the 0.025 column (2.5% in the right tail).
Comparing two variances (F-test)
The F-test compares the variances of two populations by taking their ratio. If the variances are equal, the ratio should be close to 1. If it's significantly greater than 1, the variances are different.
The F-distribution has two sets of degrees of freedom — numerator and denominator — because it's a ratio of two independent chi-square variables divided by their respective degrees of freedom. Like the chi-square, it's right-skewed and bounded at zero.
By convention, the larger sample variance always goes in the numerator. This means F ≥ 1, so you only need to check the upper critical value from the F-table. If F > F_critical, reject equality. The lower critical value (which would be the reciprocal of the upper) is never needed in practice.
Commit before you see the answer
Watch how each distribution behaves
Select a distribution to see its shape and rejection regions. Adjust the degrees of freedom to see how the distribution changes — and notice how the critical values shift.
What you should notice: the t-distribution is symmetric and approaches the standard normal as degrees of freedom increase. The chi-square distribution is right-skewed and shifts rightward as df grows. The F-distribution is always right-skewed — its shape depends on both numerator and denominator degrees of freedom.
Testing means? Use a t-test (independent or paired depending on sample dependence). Testing a single variance? Chi-square. Comparing two variances? F-test. Always: larger sample variance in the numerator for F, and remember chi-square tables show right-tail probability.
Apply what you learned
I'm pro AI. The content above was co-created with my good friend, Claude.