Wilcoxon Rank-Sum & Signed-Rank Test Calculator

Q: What is the difference between the Wilcoxon rank-sum test and the signed-rank test?

They are two different tests for different data structures. The Wilcoxon rank-sum test (Wilcoxon 1945, Mann-Whitney 1947) is for two independent groups - you have two separate samples with no pairing between observations. It combines both groups, ranks all values, and tests whether the rank sums are consistent with the null of equal distributions. The Wilcoxon signed-rank test is for paired data - each subject has two measurements (before and after, two conditions). It computes the differences, ranks the absolute differences, and tests whether positive and negative differences are balanced. Using rank-sum on paired data discards valuable pairing information and loses power.

Q: How is the Wilcoxon rank-sum test different from Mann-Whitney U?

They are mathematically identical and always give the same p-value. The difference is parameterisation: Wilcoxon uses W = R_smaller − n_smaller(n_smaller+1)/2 (rank sum of the smaller group, adjusted by the minimum possible rank sum). Mann-Whitney uses U = n₁n₂ + n₁(n₁+1)/2 − R₁. The two statistics are linearly related: W = U + n_smaller(n_smaller+1)/2. R uses 'wilcox.test' which reports W; SPSS reports U; both give identical p-values. This calculator reports both W and U equivalent for transparency.

Q: What does the Wilcoxon signed-rank test actually test?

The signed-rank test tests whether the median of the paired differences is zero - equivalently, whether the distribution of differences is symmetric around zero. It does this by ranking the absolute differences, then testing whether the sum of ranks for positive differences (W+) is significantly different from the expected value n(n+1)/4. If positive differences tend to be larger (higher ranks) than negative differences, W+ will be large and the test will be significant. It is more powerful than the sign test (which only counts the number of positive vs negative differences) because it uses the magnitude of differences as well as their sign.

Q: When should I use signed-rank instead of a paired t-test?

Use the Wilcoxon signed-rank test instead of a paired t-test when: (1) the differences between pairs are not normally distributed (especially for small samples with n < 25–30 where you cannot rely on the CLT); (2) you have outliers in the differences that would distort the mean; (3) your data is ordinal and differences are not truly meaningful numerically. The paired t-test is more powerful when differences are normally distributed. The Wilcoxon signed-rank test is about 95% as efficient as the paired t-test for normal data (asymptotic relative efficiency = 3/π), so you rarely lose much by using it even when the t-test assumptions are met.

Q: What happens to zero differences in the signed-rank test?

Pairs where the two measurements are identical (difference = 0) are excluded from the test because they provide no information about the direction of change. Only the non-zero differences are ranked. This means the effective sample size n for the test is the number of non-zero differences, not the total number of pairs. If many pairs are tied (difference = 0), the test loses power significantly, and the exact binomial sign test (which counts how many differences are positive vs negative) may be more appropriate.

Q: Can I use these tests for one-tailed hypotheses?

Yes. A one-tailed Wilcoxon test is appropriate when you have a strong prior directional hypothesis. For the rank-sum test, a one-tailed p-value is half the two-tailed p-value (for the direction that matches your hypothesis). For the signed-rank test, one-tailed tests whether the median difference is positive or negative. However, like all one-tailed tests, you must pre-commit to the direction before collecting data - otherwise you inflate the false positive rate.

Q: What is the exact distribution for the signed-rank test?

For small samples (n ≤ 25), the exact distribution of W+ under the null is computed by enumerating all 2^n equally likely sign assignments. The p-value is the proportion of sign assignments producing a W+ as extreme as observed. For n > 25, the normal approximation Z = (W+ − n(n+1)/4) / √(n(n+1)(2n+1)/24 − TC/48) is used, where TC is the tie correction Σtᵢ(tᵢ²−1). This calculator uses the normal approximation for all samples, which is accurate for n ≥ 10 non-zero differences.

Two Wilcoxon non-parametric tests in one: rank-sum for independent groups, signed-rank for paired data.

Group 1 - Independent Sample A

Group 2 - Independent Sample B

Rank Sums

n₁

n₂

W₁ (rank sum group 1)

W₂ (rank sum group 2)

Test Statistics

W (smaller group rank sum)

U equivalent

U₂ equivalent

Z-statistic

p-value (two-tailed)

📖 What are the Wilcoxon Tests?

Frank Wilcoxon introduced two landmark non-parametric tests in a single 1945 paper that transformed statistical practice by providing rigorous alternatives to the t-test that required no distributional assumptions. The Wilcoxon rank-sum test compares two independent groups by converting all observations to ranks and testing whether the rank sums are consistent with the groups coming from the same distribution. The Wilcoxon signed-rank test tests paired data by ranking the absolute values of differences and testing whether positive and negative rank sums are balanced.

The rank-sum test is mathematically identical to the Mann-Whitney U test: both tests are computing the same quantity with a linear transformation. The key is that by converting to ranks, both tests become distribution-free - valid regardless of whether the data is normal, skewed, heavy-tailed, or bounded. They are robust to outliers because an extreme value is simply the highest-ranked observation and cannot pull the test statistic arbitrarily far from zero as it would with a mean-based test.

The signed-rank test is the paired counterpart to the rank-sum test, analogous to the paired t-test. It computes differences between paired observations, removes zero differences, ranks the absolute differences, then splits into positive-rank-sum W+ and negative-rank-sum W−. Under the null hypothesis that the median difference is zero, W+ and W− should be approximately equal. The test statistic W = min(W+, W−) is compared to its expected distribution (exactly for small n, normally approximated for large n).

Both tests handle ties by assigning average ranks, with a tie correction applied to the variance formula to maintain accurate p-values. The signed-rank test additionally excludes zero differences (tied pairs) from the analysis. These tests are standard in medical research, psychology, ecology, and any field where data quality or distribution shape make parametric assumptions questionable.

📐 Formulas

Rank-Sum: Z = (W − n_s(n+1)/2) / σ_W

Rank-Sum W: Sum of ranks in the smaller group. Expected value = n_s(n+1)/2.

Variance with tie correction: σ²_W = n_s × n_l / 12 × [(n+1) − ΣT/(n(n−1))] where T = Σtᵢ(tᵢ²−1) for each tie group.

U equivalent: U₁ = n₁n₂ + n₁(n₁+1)/2 − W₁

Signed-Rank:

1. Compute differences dᵢ = x₁ᵢ − x₂ᵢ. Discard zeros.

2. Rank |dᵢ| in ascending order (average ties).

3. W+ = sum of ranks where dᵢ > 0; W− = sum of ranks where dᵢ < 0.

4. W = min(W+, W−). Expected = n(n+1)/4.

5. σ²_W = n(n+1)(2n+1)/24 − TC/48, where TC = Σtᵢ(tᵢ²−1) (tie correction for signed-rank).

6. Z = (W − E[W]) / σ_W; p-value from standard normal (two-tailed).

📖 How to Use This Calculator

Select the test type using the tabs: Rank-Sum for two independent unrelated groups, Signed-Rank for paired measurements on the same subjects (before/after, matched pairs).

Enter your data as comma-separated numbers. For Signed-Rank, both groups must have the same number of values - each pair corresponds to one subject.

Click Calculate. For rank-sum, compare W₁ and W₂ - the group with the higher rank sum has larger values. For signed-rank, compare W+ and W−.

A p-value below 0.05 indicates a statistically significant difference. If non-significant, check whether your sample size is adequate - you may be underpowered.

💡 Example Calculations

Example 1 - Rank-Sum: exam scores across two teaching methods

Group 1 (new method): 12, 15, 18, 14, 16. Group 2 (traditional): 8, 10, 9, 11, 7. Combined n = 10. Do the teaching methods produce different scores?

Ranks: Sorted: 7(G2,r=1), 8(G2,r=2), 9(G2,r=3), 10(G2,r=4), 11(G2,r=5), 12(G1,r=6), 14(G1,r=7), 15(G1,r=8), 16(G1,r=9), 18(G1,r=10). W₁ = 6+7+8+9+10 = 40. W₂ = 1+2+3+4+5 = 15.

W (smaller group) = 15. E[W] = 5×11/2 = 27.5. Z = (15−27.5)/√(5×5×11/12) = −12.5/4.787 = −2.61. p = 0.009.

Conclusion: Significant difference (p = 0.009). The new method produces higher scores. Every Group 1 score outranks every Group 2 score - a perfect separation.

W = 15, Z = −2.61, p = 0.009 (significant)

Try this example →

Example 2 - Signed-Rank: before/after pain scores

Before treatment: 8, 7, 9, 6, 8, 7. After treatment: 5, 6, 6, 4, 7, 5. Paired data - same 6 patients measured twice. Pain on 1–10 scale.

Differences (before − after): 3, 1, 3, 2, 1, 2. All positive (pain decreased in all patients). n = 6 non-zero differences. Ranks of |d|: 1,1→rank 1.5; 2,2→rank 3.5; 3,3→rank 5.5.

W+ = 1.5+1.5+3.5+3.5+5.5+5.5 = 21. W− = 0. W = 0 (minimum). E[W] = 6×7/4 = 10.5. Z = (0−10.5)/3.71 = −2.83. p = 0.005.

Conclusion: Highly significant reduction in pain after treatment (p = 0.005). All 6 patients improved - the treatment reliably reduced pain scores in this small pilot study.

W⁺ = 21, W⁻ = 0, Z = −2.83, p = 0.005 (significant)

Try this example →

❓ Frequently Asked Questions

What is the difference between the Wilcoxon rank-sum test and the signed-rank test?+

They are two different tests for different data structures. The Wilcoxon rank-sum test (Wilcoxon 1945, Mann-Whitney 1947) is for two independent groups - you have two separate samples with no pairing between observations. It combines both groups, ranks all values, and tests whether the rank sums are consistent with the null of equal distributions. The Wilcoxon signed-rank test is for paired data - each subject has two measurements (before and after, two conditions). It computes the differences, ranks the absolute differences, and tests whether positive and negative differences are balanced. Using rank-sum on paired data discards valuable pairing information and loses power.

How is the Wilcoxon rank-sum test different from Mann-Whitney U?+

They are mathematically identical and always give the same p-value. The difference is parameterisation: Wilcoxon uses W = R_smaller − n_smaller(n_smaller+1)/2 (rank sum of the smaller group, adjusted by the minimum possible rank sum). Mann-Whitney uses U = n₁n₂ + n₁(n₁+1)/2 − R₁. The two statistics are linearly related: W = U + n_smaller(n_smaller+1)/2. R uses 'wilcox.test' which reports W; SPSS reports U; both give identical p-values. This calculator reports both W and U equivalent for transparency.

What does the Wilcoxon signed-rank test actually test?+

The signed-rank test tests whether the median of the paired differences is zero - equivalently, whether the distribution of differences is symmetric around zero. It does this by ranking the absolute differences, then testing whether the sum of ranks for positive differences (W+) is significantly different from the expected value n(n+1)/4. If positive differences tend to be larger (higher ranks) than negative differences, W+ will be large and the test will be significant. It is more powerful than the sign test (which only counts the number of positive vs negative differences) because it uses the magnitude of differences as well as their sign.

When should I use signed-rank instead of a paired t-test?+

Use the Wilcoxon signed-rank test instead of a paired t-test when: (1) the differences between pairs are not normally distributed (especially for small samples with n < 25–30 where you cannot rely on the CLT); (2) you have outliers in the differences that would distort the mean; (3) your data is ordinal and differences are not truly meaningful numerically. The paired t-test is more powerful when differences are normally distributed. The Wilcoxon signed-rank test is about 95% as efficient as the paired t-test for normal data (asymptotic relative efficiency = 3/π), so you rarely lose much by using it even when the t-test assumptions are met.

What happens to zero differences in the signed-rank test?+

Pairs where the two measurements are identical (difference = 0) are excluded from the test because they provide no information about the direction of change. Only the non-zero differences are ranked. This means the effective sample size n for the test is the number of non-zero differences, not the total number of pairs. If many pairs are tied (difference = 0), the test loses power significantly, and the exact binomial sign test (which counts how many differences are positive vs negative) may be more appropriate.

Can I use these tests for one-tailed hypotheses?+

Yes. A one-tailed Wilcoxon test is appropriate when you have a strong prior directional hypothesis. For the rank-sum test, a one-tailed p-value is half the two-tailed p-value (for the direction that matches your hypothesis). For the signed-rank test, one-tailed tests whether the median difference is positive or negative. However, like all one-tailed tests, you must pre-commit to the direction before collecting data - otherwise you inflate the false positive rate.

What is the exact distribution for the signed-rank test?+

For small samples (n ≤ 25), the exact distribution of W+ under the null is computed by enumerating all 2^n equally likely sign assignments. The p-value is the proportion of sign assignments producing a W+ as extreme as observed. For n > 25, the normal approximation Z = (W+ − n(n+1)/4) / √(n(n+1)(2n+1)/24 − TC/48) is used, where TC is the tie correction Σtᵢ(tᵢ²−1). This calculator uses the normal approximation for all samples, which is accurate for n ≥ 10 non-zero differences.

What sample size do I need for the signed-rank test?+

You need a minimum of 6 non-zero differences for any p-value less than 0.05 to be achievable. Practical minimum is 10 non-zero pairs. For 80% power to detect a medium effect (Cohen's d = 0.5 on the differences) at α = 0.05, you need approximately 27 pairs. Note that zero differences reduce the effective n, so collect more subjects if you expect many tied differences. The test is most efficient when differences are roughly symmetric and has nearly as much power as the paired t-test for normal data.

How do I report Wilcoxon test results in a paper?+

Report the test statistic (W for rank-sum, W+ for signed-rank), the sample sizes, and the p-value. Example (rank-sum): 'Group 1 scores (Mdn = 6.5) were significantly higher than Group 2 scores (Mdn = 3.5), W = 42, n₁ = 8, n₂ = 8, p = 0.027 (two-tailed).' Always report medians rather than means with Wilcoxon tests since the test is rank-based. Include the effect size (rank-biserial r for rank-sum, or matched-pairs r = Z/√N for signed-rank). Specify whether you used a continuity correction or exact p-value.

🔗 Related Calculators

📌 Quick Tips

💡Use the Rank-Sum tab for two independent groups (unpaired). Use the Signed-Rank tab for two measurements on the same subjects (paired/before-after).

💡The Signed-Rank test drops zero differences (tied pairs) before ranking. Pairs where the difference is zero contribute nothing to the test statistic.

💡Both tests are valid for ordinal data (ranks, Likert scales) and non-normal continuous data. They are robust to outliers because they use ranks, not raw values.

Wilcoxon Rank-Sum & Signed-Rank Test Calculator

Rank Sums

Test Statistics

Signed-Rank Statistics

📖 What are the Wilcoxon Tests?

📐 Formulas

📖 How to Use This Calculator

💡 Example Calculations

Example 1 - Rank-Sum: exam scores across two teaching methods

Example 2 - Signed-Rank: before/after pain scores

❓ Frequently Asked Questions

🔗 Related Calculators

📌 Quick Tips