Linear Regression Calculator

Q: What is linear regression?

Linear regression is a statistical method that models the relationship between a dependent variable (Y) and one or more independent variables (X) using a straight line. Simple linear regression fits a line y = mx + b that minimises the sum of squared residuals (the vertical distances between data points and the line).

Q: What is the least squares method?

The least squares method finds the slope (m) and intercept (b) that minimise the sum of squared residuals Σ(yᵢ − ŷᵢ)². This gives the best-fit line - the line that is closest to all data points simultaneously. The formulas for m and b are derived analytically and have closed-form solutions.

Q: What is R-squared in linear regression?

R² (coefficient of determination) measures the proportion of variance in Y explained by the regression model. R² = 1 − SS_res/SS_tot, where SS_res = Σ(yᵢ−ŷᵢ)² and SS_tot = Σ(yᵢ−ȳ)². R² = 0.85 means 85% of the variance in Y is explained by X. R² = 1 means a perfect fit; R² = 0 means the model explains nothing.

Q: What is the difference between correlation and regression?

Correlation (r) measures the strength and direction of a linear relationship between two variables. Regression goes further - it fits a model to predict Y from X. Note: r² = R² in simple linear regression. Correlation is symmetric (same if you swap X and Y), but regression is not (slope changes if you swap X and Y).

Q: What are residuals in regression?

A residual is the difference between the observed Y value and the predicted Ŷ value from the regression line: residual = yᵢ − ŷᵢ. Residuals should be randomly scattered around zero (random pattern in a residual plot). Patterns in residuals (curves, funnels) suggest the linear model is not appropriate.

Q: What is the standard error of regression?

The standard error of the regression (SER or s) measures the typical size of residuals: SER = √[Σ(yᵢ−ŷᵢ)²/(n−2)]. A smaller SER means the data points are closer to the regression line. It is used in computing confidence intervals for predictions.

Q: How do I interpret the slope in a regression equation?

The slope m represents the average change in Y for a one-unit increase in X, holding everything else constant. For example, if Y is salary (£) and X is years of experience, m = 3,500 means each additional year of experience is associated with £3,500 higher salary on average.

Q: What is the difference between simple and multiple linear regression?

Simple linear regression has one predictor variable (X). Multiple linear regression has two or more predictor variables (X₁, X₂, ..., Xₖ). This calculator handles simple linear regression. Multiple regression requires matrix algebra and is available in statistical software.

Fit a straight line to your data and get slope, intercept, R², and predictions instantly.

📖 What is Linear Regression?

Linear regression is the most fundamental statistical modelling technique. It models the relationship between a continuous dependent variable (Y) and one or more independent variables (X) by fitting a straight line through the data. The goal is to find the line that best represents the underlying relationship - specifically, the line that minimises the sum of squared vertical distances from each data point to the line.

Developed in the early 19th century by Gauss and Legendre, the method of least squares underpins an enormous range of applications: predicting house prices from square footage, estimating the effect of advertising spend on sales, calibrating scientific instruments, and quantifying the relationship between a drug dose and patient response.

The fitted line y = mx + b makes two key assumptions: linearity (the true relationship is roughly linear) and homoscedasticity (the variance of residuals is roughly constant across X values). When these are violated, alternatives like polynomial regression, weighted regression, or non-linear regression are needed.

This calculator performs simple linear regression (one predictor) using the ordinary least squares method and reports the regression equation, R², correlation coefficient, standard error of regression, and residuals.

📐 Formulas

ŷ = mx + b

Slope: m = [n·Σ(xy) − Σx·Σy] / [n·Σ(x²) − (Σx)²]

Intercept: b = ȳ − m·x̄

R² (coefficient of determination): R² = 1 − SS_res/SS_tot

where SS_res = Σ(yᵢ − ŷᵢ)² and SS_tot = Σ(yᵢ − ȳ)²

Correlation coefficient: r = [n·Σ(xy) − Σx·Σy] / √{[n·Σx² − (Σx)²][n·Σy² − (Σy)²]}

Note: r = √R² with the sign of m. In simple regression R² = r².

Standard error of regression: SER = √[SS_res / (n−2)]

Residual for point i: eᵢ = yᵢ − ŷᵢ = yᵢ − (mx_i + b)

📖 How to Use This Calculator

Enter the X values (independent variable) as a comma-separated list. Then enter the corresponding Y values in the same order. Both lists must have the same number of values.

Click Calculate Linear Regression. The regression equation, slope, intercept, R², r, and standard error appear instantly.

To predict a Y value, enter any X in the Prediction field. The predicted Y (ŷ) from the fitted line is shown immediately.

Interpret R²: > 0.90 is excellent fit, 0.70–0.90 is good, 0.50–0.70 is moderate, < 0.50 is poor (for most applications).

📝 Example Calculations

Example 1 - Study Hours vs Exam Score

X (hours): 1, 2, 3, 4, 5. Y (score): 55, 62, 70, 75, 82.

Σx=15, Σy=344, Σx²=55, Σxy=1083, n=5

m = (5×1083 − 15×344)/(5×55 − 225) = (5415−5160)/(275−225) = 255/50 = 5.10

b = 344/5 − 5.10×15/5 = 68.8 − 15.3 = 53.5

Equation: ŷ = 5.10x + 53.5. R² ≈ 0.993 - excellent fit.

ŷ = 5.10x + 53.5, R² = 0.993

Try this example →

Example 2 - Advertising vs Sales

X (ad spend £000): 10, 20, 30, 40, 50. Y (sales £000): 120, 145, 175, 195, 230.

m ≈ 2.72, b ≈ 95.6. Equation: ŷ = 2.72x + 95.6

R² ≈ 0.993. Prediction at x = 35: ŷ = 2.72×35 + 95.6 = 190.8 (£190,800 expected sales).

ŷ = 2.72x + 95.6, R² = 0.993

Try this example →

Example 3 - Temperature vs Ice Cream Sales

X (°C): 20, 25, 30, 35, 40. Y (units/day): 80, 120, 160, 200, 250.

m ≈ 8.40, b ≈ −84.0. Equation: ŷ = 8.40x − 84.0

R² ≈ 0.995. At 38°C: ŷ = 8.40×38 − 84 = 235 units expected.

ŷ = 8.40x − 84.0, R² = 0.995

Try this example →

Example 4 - Poor Fit Example

X: 1, 2, 3, 4, 5. Y: 10, 40, 20, 45, 15. Random scatter with no trend.

R² ≈ 0.01 - the linear model explains almost none of the variation. A different model (or accepting no relationship) is needed.

R² ≈ 0.01 (poor fit)

Try this example →

❓ Frequently Asked Questions

What is linear regression?+

Linear regression is a statistical method that models the relationship between a dependent variable (Y) and one or more independent variables (X) using a straight line. Simple linear regression fits a line y = mx + b that minimises the sum of squared residuals (the vertical distances between data points and the line).

What is the least squares method?+

The least squares method finds the slope (m) and intercept (b) that minimise the sum of squared residuals Σ(yᵢ − ŷᵢ)². This gives the best-fit line - the line that is closest to all data points simultaneously. The formulas for m and b are derived analytically and have closed-form solutions.

What is R-squared in linear regression?+

R² (coefficient of determination) measures the proportion of variance in Y explained by the regression model. R² = 1 − SS_res/SS_tot, where SS_res = Σ(yᵢ−ŷᵢ)² and SS_tot = Σ(yᵢ−ȳ)². R² = 0.85 means 85% of the variance in Y is explained by X. R² = 1 means a perfect fit; R² = 0 means the model explains nothing.

What is the difference between correlation and regression?+

Correlation (r) measures the strength and direction of a linear relationship between two variables. Regression goes further - it fits a model to predict Y from X. Note: r² = R² in simple linear regression. Correlation is symmetric (same if you swap X and Y), but regression is not (slope changes if you swap X and Y).

What are residuals in regression?+

A residual is the difference between the observed Y value and the predicted Ŷ value from the regression line: residual = yᵢ − ŷᵢ. Residuals should be randomly scattered around zero (random pattern in a residual plot). Patterns in residuals (curves, funnels) suggest the linear model is not appropriate.

What is the standard error of regression?+

The standard error of the regression (SER or s) measures the typical size of residuals: SER = √[Σ(yᵢ−ŷᵢ)²/(n−2)]. A smaller SER means the data points are closer to the regression line. It is used in computing confidence intervals for predictions.

How do I interpret the slope in a regression equation?+

The slope m represents the average change in Y for a one-unit increase in X, holding everything else constant. For example, if Y is salary (£) and X is years of experience, m = 3,500 means each additional year of experience is associated with £3,500 higher salary on average.

What is the difference between simple and multiple linear regression?+

Simple linear regression has one predictor variable (X). Multiple linear regression has two or more predictor variables (X₁, X₂, ..., Xₖ). This calculator handles simple linear regression. Multiple regression requires matrix algebra and is available in statistical software.

Linear Regression Calculator

📖 What is Linear Regression?

📐 Formulas

📖 How to Use This Calculator

📝 Example Calculations

❓ Frequently Asked Questions

🔗 Related Calculators

📌 Quick Tips