Residual Calculator

Analyse regression fit quality by computing residuals, SSR, RMSE, and standardised residuals.

ε Residual Calculator
SSR (Sum of Squared Residuals)
-
RMSE
-
Mean Residual
-
n (observations)
-

📖 What are Residuals in Regression?

A residual is the difference between an observed data point and the value predicted by a statistical model: eᵢ = yᵢ − ŷᵢ. In linear regression, the model predicts an output value ŷ for each input x. The residual for observation i is how far the actual observed y value lies from that predicted line. A positive residual means the point is above the regression line; a negative residual means it is below.

Residuals are one of the most important diagnostic tools in statistics. They reveal whether the model is appropriate for the data, whether the assumptions of linear regression hold, and whether any observations are unusually influential. Ordinary Least Squares (OLS) regression finds the line that minimises the sum of squared residuals (SSR = Σeᵢ²) - this is literally what "least squares" means.

The sum of squared residuals measures total unexplained variation. The RMSE (root mean square error) is the typical prediction error in original units. Standardised residuals - residuals divided by the standard error - put all residuals on a common scale so that outliers can be identified regardless of the measurement units. A standardised residual beyond ±2 indicates a point that is more than two standard errors away from the predicted value, which occurs in only about 5% of observations under normal assumptions.

Residual analysis is an essential step after fitting any regression model. Random scatter in residual plots confirms that the model is appropriate; systematic patterns signal problems such as non-linearity, heteroscedasticity, or omitted variables that require model revision.

📐 Formulas

Residual: eᵢ = yᵢ − ŷᵢ

Predicted value (from regression equation): ŷᵢ = b₀ + b₁·xᵢ, where b₀ = intercept, b₁ = slope.

Sum of Squared Residuals (SSR): SSR = Σᵢ eᵢ² = Σᵢ (yᵢ − ŷᵢ)²

RMSE (Root Mean Square Error): RMSE = √(SSR / n)

Standard Error of Residuals (SER): SER = √(SSR / (n − 2)) for simple linear regression with one predictor.

Standardised Residual: zᵢ = eᵢ / SER

Outlier flag: |zᵢ| > 2 (potential outlier); |zᵢ| > 3 (strong outlier).

Mean Residual (OLS property): ē = (1/n) Σeᵢ = 0 when regression includes an intercept.

All variables: yᵢ = observed value; ŷᵢ = predicted value; b₀ = intercept; b₁ = slope; n = number of observations; SER = standard error of residuals.

📖 How to Use This Calculator

1
Select input mode: use "Observed vs Predicted" if you already have both y and ŷ values from a regression package; use "Regression equation" if you have the slope and intercept and want the calculator to compute ŷᵢ = b₀ + b₁·xᵢ automatically.
2
Enter all values as comma-separated lists. Make sure observed Y and predicted Ŷ (or X) lists have the same number of entries. Click Calculate Residuals.
3
Review the summary metrics (SSR, RMSE, mean residual) and the detailed table showing each observation's residual and standardised residual. Potential outliers are highlighted automatically.

💡 Example Calculations

Example 1 - Simple regression residuals

1
Observed Y: 10, 12, 14, 16, 18. Predicted Ŷ: 9.5, 12.1, 13.8, 16.3, 18.4.
2
Residuals: e₁=0.5, e₂=−0.1, e₃=0.2, e₄=−0.3, e₅=−0.4. Mean residual ≈ −0.02 ≈ 0 (consistent with OLS).
3
SSR = 0.25+0.01+0.04+0.09+0.16 = 0.55. RMSE = √(0.55/5) = 0.332. All standardised residuals are small - no outliers detected.
Result = SSR = 0.55; RMSE = 0.332; no outliers
Try this example →

Example 2 - Forecasting errors with an outlier

1
Observed: 100, 105, 102, 98, 145. Predicted: 101, 104, 103, 99, 103. Residuals: −1, 1, −1, −1, 42.
2
SSR = 1+1+1+1+1764 = 1768. RMSE = √(1768/5) = 18.8. The 5th observation with a residual of 42 is clearly anomalous.
3
SER = √(1768/3) ≈ 24.3. Standardised residual for obs 5 = 42/24.3 ≈ 1.73 - below 2 only because the SER itself is inflated by the outlier. This illustrates how one outlier can mask itself by inflating SER.
Result = SSR = 1768; RMSE = 18.8; outlier at observation 5
Try this example →

❓ Frequently Asked Questions

What is a residual in regression?+
A residual is the difference between an observed value and the value predicted by the regression model: eᵢ = yᵢ − ŷᵢ. A positive residual means the model underpredicted; a negative residual means it overpredicted. Residuals represent the unexplained variation - the part of the outcome that the model could not account for. Together, the set of residuals tells you how well the regression line fits the data and whether any assumptions of linear regression are violated.
What does a pattern in residuals indicate?+
In a well-fitting OLS regression, residuals should appear random and structureless when plotted against fitted values. A curved pattern (U-shape or arch) suggests the relationship is non-linear and a linear model is inappropriate. A funnel shape (residuals spread out as fitted values increase) indicates heteroscedasticity - non-constant variance - which violates OLS assumptions and can make standard errors unreliable. A systematic pattern by time or group suggests omitted variables. Only random scatter indicates the model assumptions are met.
Why should the sum of residuals equal zero in OLS regression?+
In ordinary least squares regression with an intercept term, the sum (and mean) of residuals is exactly zero by mathematical necessity. The OLS estimator is derived by minimising the sum of squared residuals, and the first-order conditions require that the regression line passes through the point (x̄, ȳ). This constraint forces the positive and negative residuals to cancel exactly. If your sum of residuals is not zero (or very close to it), you may have used a regression without an intercept or made a calculation error.
What are standardised residuals and why are they useful?+
Standardised residuals divide each residual by the standard error of the residuals (SER = RMSE): eᵢ/SER. This converts residuals to a common scale regardless of the original units, making it easy to compare residuals across models with different scales. Under OLS assumptions, standardised residuals should be roughly normally distributed with mean 0 and standard deviation 1. Values beyond ±2 occur roughly 5% of the time by chance and are flagged as potential outliers; values beyond ±3 are very unusual and strong outlier candidates.
What is the sum of squared residuals (SSR) and how does it relate to R²?+
SSR (also called RSS or SSE - sum of squared errors) is Σ(yᵢ − ŷᵢ)², the total unexplained variation in the outcome. It is used to compute R² as: R² = 1 − SSR/SST, where SST = Σ(yᵢ − ȳ)² is the total variation. A lower SSR means the model fits better. R² ranges from 0 (model explains nothing) to 1 (perfect fit). RMSE = √(SSR/n) and is expressed in the same units as y, making it more interpretable than SSR alone.
How do I interpret RMSE from the residual analysis?+
RMSE (Root Mean Square Error) = √(SSR/n) is the typical size of a prediction error. If you are modelling house prices and RMSE = 25,000, it means predictions are typically off by £25,000. A lower RMSE indicates a better fit, but RMSE must be compared to the scale of the outcome variable - an RMSE of 25,000 is excellent if prices are £5M but poor if prices are £100,000. RMSE is sensitive to outliers because errors are squared before averaging.
What does it mean for a data point to be a high-leverage point vs an outlier?+
An outlier is a point with an unusually large residual - the observed value is far from what the model predicts. A high-leverage point has an unusual x-value (far from the mean of x) but may still lie close to the regression line. The most concerning points are high-leverage points that are also outliers - these are called influential observations and can dramatically shift the regression coefficients. This calculator identifies outliers by standardised residuals; leverage analysis requires the hat matrix and is beyond the scope of this tool.
Can I use this calculator for multiple regression residuals?+
Yes, if you have already run a multiple regression and have the predicted values (ŷᵢ), you can enter the observed and predicted values directly using the first input mode. The calculator does not compute the regression coefficients for multiple regression - you would need to obtain those from a statistics package. It then computes all residual diagnostics (SSR, RMSE, standardised residuals, outliers) on the provided values.