Hypothesis Testing Calculator
Run a complete, guided hypothesis test in 6 structured steps - from stating H₀ to the final conclusion.
📖 What is Hypothesis Testing?
Hypothesis testing is a formal statistical procedure for using sample data to evaluate a claim about a population parameter. It is the backbone of scientific inquiry - from clinical drug trials to manufacturing quality control to psychological experiments. The procedure answers the question: "Given what I observed in my sample, is there enough evidence to reject the assumption that nothing unusual is happening?"
Every hypothesis test involves two competing statements. The null hypothesis (H₀) is the default, conservative position - typically that a population mean equals a reference value, that two groups are the same, or that a proportion equals a claimed value. The alternative hypothesis (H₁) is what the researcher is trying to demonstrate: that there is a real effect, a real difference, or a meaningful departure from the reference.
The test works by computing a test statistic - a number that summarises how far the sample data is from what H₀ predicts. Under H₀, this statistic follows a known distribution (Z, t, F, chi-square). The p-value measures how likely it is to observe a result at least as extreme as yours if H₀ were true. A small p-value (below the chosen significance level α, typically 0.05) is evidence against H₀, and we reject it in favour of H₁.
This calculator supports five major test types: the one-sample Z-test (population σ known), the one-sample t-test (σ estimated from sample), the one-proportion Z-test, the two-sample Welch's t-test, and the paired t-test. For every test, it produces all six standard steps and computes Cohen's d as an effect size to quantify practical significance alongside statistical significance.
📐 Formulas
One-sample Z: Z = (x̄ − μ₀) / (σ / √n) - use when population σ is known
One-proportion Z: Z = (p̂ − p₀) / √(p₀(1−p₀)/n) - normal approximation; valid when np₀ ≥ 5 and n(1−p₀) ≥ 5
Two-sample Welch's t: t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂), with df by Welch-Satterthwaite equation
Paired t: t = d̄ / (s_d / √n), df = n − 1, where d̄ = mean of pair differences, s_d = their SD
p-value (two-tailed): p = 2 × P(T > |t_obs|) - compare to α; reject H₀ if p < α
Cohen's d (effect size): d = |x̄ − μ₀| / s for one-sample; d = |x̄₁ − x̄₂| / s_pooled for two-sample
📖 How to Use This Calculator
📝 Example Calculations
Example 1 - Medical Treatment Effectiveness (One-Sample t-Test)
A cardiologist wants to know if a new drug changes mean systolic blood pressure from the known baseline of 130 mmHg. A sample of 25 patients shows x̄ = 124, s = 10. Test at α = 0.05, two-tailed.
t = (124 − 130) / (10 / √25) = −6 / 2 = −3.000, df = 24
p ≈ 0.006 < 0.05 - Reject H₀. The drug significantly changes blood pressure.
Cohen's d = |124 − 130| / 10 = 0.60 - medium effect size.
Example 2 - Manufacturing Quality Test (One-Sample Z-Test)
A factory claims its bolts have mean diameter μ = 12.00 mm with known σ = 0.05 mm. A QA inspector measures n = 36 bolts and finds x̄ = 12.008 mm. Is the process off-spec? α = 0.05, two-tailed.
Z = (12.008 − 12.000) / (0.05 / √36) = 0.008 / 0.00833 = 0.96
p ≈ 0.337 > 0.05 - Fail to Reject H₀. No significant evidence the process is off-spec.
Example 3 - Election Polling (One-Proportion Z-Test)
An exit poll of 500 voters shows 54% (p̂ = 0.54) supporting Candidate A. Is there significant evidence the candidate leads (> 50%)? α = 0.05, one-tailed right.
SE = √(0.50 × 0.50 / 500) = 0.02236; Z = (0.54 − 0.50) / 0.02236 = 1.789
p ≈ 0.037 < 0.05 - Reject H₀. Statistically significant evidence that Candidate A leads.
Example 4 - A/B Test (Two-Sample Welch's t-Test)
Website A: mean session time 240 s, s = 45, n = 80. Website B: mean 220 s, s = 60, n = 70. Did redesign improve engagement? α = 0.05, two-tailed.
SE = √(45²/80 + 60²/70) = √(25.31 + 51.43) = √76.74 = 8.76; t = (240 − 220) / 8.76 = 2.283
Welch df ≈ 123; p ≈ 0.024 < 0.05 - Reject H₀. Redesign significantly increased session time.
Example 5 - Before/After Study (Paired t-Test)
20 students take a study skills course. Mean improvement in test score = 8.5 points, SD of differences = 12.0. Did the course help? α = 0.05, one-tailed right.
t = 8.5 / (12.0 / √20) = 8.5 / 2.683 = 3.168, df = 19
p ≈ 0.003 < 0.05 - Reject H₀. Course significantly improved scores. Cohen's d = 8.5 / 12.0 = 0.71 (medium-large effect).