Constructing Hypothesis Tests: t, χ², and F

The Framework

One question determines everything

Before you touch a formula, ask: what am I testing? The answer funnels you into exactly one of four tests. The entire LOS reduces to this decision tree:

What are you testing?	Test	Distribution
Value of a single population mean	t-test (or z if n large)	t or z
Equality of two means — independent samples	t-test (pooled variance)	t
Equality of two means — dependent samples	Paired comparisons t-test	t
Value of a single population variance	Chi-square (χ²) test	χ²
Equality of two population variances	F-test	F

The first three are about means and all use some form of t-statistic. The last two are about variances and use distributions that look different — chi-square is right-skewed and bounded at zero, while the F-distribution is a ratio of two chi-squares. Let's build each one.

Layer 1

Testing a single population mean

This is the simplest case. You have one sample and want to test whether the population mean equals some hypothesized value. The logic is universal to all hypothesis tests: compute how far your sample result is from the hypothesized value, measured in standard error units.

t = (x̄ − μ₀) / (s / √n)

df = n − 1. Use z-distribution if n is large (≥ 30 and σ known).

The numerator is the observed gap. The denominator is the noise level. When this ratio is extreme enough to land in the rejection region, you reject H₀.

Worked Example

A researcher measures 250 daily returns on an options portfolio. Mean return: 0.1%. Standard deviation: 0.25%. Is the mean return different from zero?

1

Hypotheses: H₀: μ = 0 vs. Hₐ: μ ≠ 0 (two-tailed)

2

Standard error: s/√n = 0.25%/√250 = 0.0158%

3

Test statistic: (0.1% − 0) / 0.0158% = 6.33

4

Decision: 6.33 > 1.96 (critical z at α = 0.05) → Reject H₀

Layer 2

Difference between two means (independent samples)

Now you have two separate samples and want to test whether their population means are equal. The key word is independent — the samples don't influence each other. This test uses a pooled variance when the population variances are assumed equal.

t = (x̄₁ − x̄₂) / √(s²p/n₁ + s²p/n₂)

where s²p = [(n₁−1)s²₁ + (n₂−1)s²₂] / (n₁ + n₂ − 2), df = n₁ + n₂ − 2

The intuition is clean: if the two sample means are far apart relative to the pooled noise level, the t-statistic is large and you reject equality. The pooled variance is just a weighted average of the two sample variances, weighted by their respective degrees of freedom.

Key Insight

The difference-in-means test requires three assumptions: independent samples, normally distributed populations, and equal population variances. If the samples are dependent, you need the paired comparisons test instead.

Layer 3

Paired comparisons (dependent samples)

When samples are linked — both affected by the same external factor — you can't use the difference-in-means test. Instead, you compute the difference within each pair, then test whether the mean of those differences is zero. This is really just a single-mean t-test applied to the differences.

t = (d̄ − μ_dz) / s_d̄

where d̄ = mean of paired differences, s_d̄ = s_d / √n, df = n − 1

The classic example: comparing betas of the same companies before and after an event. The returns of both periods are influenced by shared market conditions, so the samples are dependent. By working with the differences within each pair, you control for that shared influence.

Independent vs. Dependent — The Exam Distinction

If the two samples can be drawn from completely different pools (e.g., textile firms vs. paper firms), they're independent → use the difference-in-means test. If each observation in one sample has a natural partner in the other (e.g., the same firm's beta before vs. after), they're dependent → use the paired test.

Where this rule of thumb can get tricky: sometimes dependence isn't obvious. Two industry portfolios measured over the same time periods are both affected by market returns — that shared exposure makes them dependent even though they're different portfolios.

Layer 4

Testing a single population variance (χ² test)

Now we shift from means to variances. The chi-square test checks whether a population variance equals a specific value. The distribution is asymmetric — it's right-skewed and bounded at zero, because variances can't be negative.

χ² = (n − 1) × s² / σ₀²

df = n − 1. Two-tailed tests require two critical values (upper and lower).

For a two-tailed test at α = 0.05 with 30 degrees of freedom, the critical values are 16.791 (lower) and 46.979 (upper). Notice they're not symmetric around a center — that's because the chi-square distribution itself is asymmetric. The lower critical value comes from the 0.975 column of the chi-square table (97.5% of probability to its right), and the upper from the 0.025 column.

Reading the χ² Table

Chi-square table columns show probability in the right tail. For a two-tailed test at 5% significance: the lower critical value uses the 0.975 column (2.5% in the left tail = 97.5% to the right), and the upper critical value uses the 0.025 column (2.5% in the right tail).

Layer 5

Comparing two variances (F-test)

The F-test compares the variances of two populations by taking their ratio. If the variances are equal, the ratio should be close to 1. If it's significantly greater than 1, the variances are different.

F = s²₁ / s²₂

Always put the larger variance in the numerator. df₁ = n₁ − 1, df₂ = n₂ − 1.

The F-distribution has two sets of degrees of freedom — numerator and denominator — because it's a ratio of two independent chi-square variables divided by their respective degrees of freedom. Like the chi-square, it's right-skewed and bounded at zero.

Exam Shortcut

By convention, the larger sample variance always goes in the numerator. This means F ≥ 1, so you only need to check the upper critical value from the F-table. If F > F_critical, reject equality. The lower critical value (which would be the reciprocal of the upper) is never needed in practice.

Predict

Commit before you see the answer

Your Prediction

A portfolio manager wants to test whether two steel companies have equal average returns over the same 60-month period. Both are affected by market conditions. Which test should she use?

Correct. The returns of both companies over the same time period are influenced by shared market conditions — they're dependent samples. The paired comparisons test uses the differences in monthly returns within each pair (same month, different company) and tests whether the mean difference is zero.

Not quite. This is a test of two means, but the samples aren't independent — both companies' returns over the same months are influenced by market and industry conditions. That dependence means you need the paired comparisons test, not the independent samples test. The paired test works with the differences within each pair of observations.

See It

Watch how each distribution behaves

Select a distribution to see its shape and rejection regions. Adjust the degrees of freedom to see how the distribution changes — and notice how the critical values shift.

Distribution Explorer

Degrees of Freedom 30

Denominator df 30

α (two-tailed) 5%

-2.042

Lower Critical

2.042

Upper Critical

Reject if |t| > 2.042

Decision Rule

What you should notice: the t-distribution is symmetric and approaches the standard normal as degrees of freedom increase. The chi-square distribution is right-skewed and shifts rightward as df grows. The F-distribution is always right-skewed — its shape depends on both numerator and denominator degrees of freedom.

The One-Liner

Testing means? Use a t-test (independent or paired depending on sample dependence). Testing a single variance? Chi-square. Comparing two variances? F-test. Always: larger sample variance in the numerator for F, and remember chi-square tables show right-tail probability.

Test Yourself

Apply what you learned

Question 1

A fund advertises monthly return volatility of 3%. An analyst collects 36 months of returns and measures a standard deviation of 3.6%. She wants to test whether volatility has increased. What is the appropriate test?

She's testing a variance (volatility = standard deviation), not a mean. The chi-square test is used for hypotheses about a single population variance.

Correct. Volatility is the standard deviation, and she's testing whether the population variance differs from an advertised value. The chi-square test statistic is χ² = (n−1)s²/σ₀² with n−1 = 35 degrees of freedom. Since she suspects an increase, this would be a one-tailed test.

The F-test compares variances of two different populations. Here, there's only one population — the fund's returns. She's testing whether its variance equals a specific value.

Question 2

A researcher tests H₀: σ² = 0.0016 using a sample of 24 observations and computes χ² = 20.76. The critical values at 5% significance with 23 df are 11.689 (lower) and 38.076 (upper). What is the correct conclusion?

Correct. 11.689 < 20.76 < 38.076, so the test statistic falls in the non-rejection region. The sample variance is not significantly different from the hypothesized value at the 5% level.

The test statistic needs to fall outside both critical values to reject. Here, 20.76 is between 11.689 and 38.076 — it's inside the non-rejection region.

The sample variance might be numerically different, but that doesn't mean it's statistically significantly different. The test statistic (20.76) falls between the critical values, so the difference isn't large enough to reject H₀.

Question 3

An analyst examines earnings variability in two industries: textiles (n = 31, s = $4.30) and paper (n = 41, s = $3.80). She computes F = 4.30² / 3.80² = 1.28. The critical F-value at 5% (df₁ = 30, df₂ = 40) is 1.94. Which statement is most accurate?

Numerical differences aren't the same as statistical significance. The F-statistic of 1.28 doesn't exceed the critical value of 1.94, so the difference isn't significant at the 5% level.

The chi-square test is for testing a single variance against a specific value. When comparing two variances, the F-test is correct.

Correct. With the larger variance in the numerator, F = 1.28. Since 1.28 < 1.94 (the upper critical value), we fail to reject the null hypothesis of equal variances.

Constructing Hypothesis Tests: t, χ², and F

Theo Leimer

Constructing Hypothesis Tests

One question determines everything

Testing a single population mean

Difference between two means (independent samples)

Paired comparisons (dependent samples)

Testing a single population variance (χ² test)

Comparing two variances (F-test)

Commit before you see the answer

Watch how each distribution behaves

Apply what you learned

Read more

How to pass the CFA Level 1 in one go

Chief of Staff, Western Markets & Switzerland

Hypothesis Testing: Type I & Type II Errors

13 Reasons Why I'm Pivoting from Marketing to Finance