# Simple linear regression

A

**simple linear regression**is alinear regression in which there is only onecovariate (predictor variable). Simple linear regression is a form ofmultiple regression .Simple linear regression is used in situations to evaluate the linear relationship between two variables. One example could be the relationship between muscle strength and lean body mass. Another way to put it is that simple linear regression is used to develop an equation by which we can predict or estimate a dependent variable given an independent variable. The regression equation is given by

: $Y\; =\; a\; +\; bX\; +\; varepsilon$

Where $Y$ is the dependent variable, $a$ is the y intercept, $b$ is the gradient or slope of the line, $X$ is independent variable and $varepsilon$ is a random term. The linear relationship between the two variables (i.e. dependent and independent) can be measured using a correlation coefficient e.g. the

Pearson product moment correlation coefficient .**Estimating the regression line**The parameters of the linear regression line, $Y\; =\; a\; +\; bX$, can be estimated using the method of

ordinary least squares . This method finds the line that minimizes the sum of the squares of the regression residuals, $sum\_\{i=1\}^N\; hat\{varepsilon\}\_\{i\}^2$. The residual is the difference between the observed value and the predicted value: $hat\{varepsilon\}\; \_\{i\}\; =\; y\_\{i\}\; -\; hat\{y\}\_\{i\}$The minimization problem can be solved using calculus, producing the following formulas for the estimates of the regression parameters:

: $hat\{b\}\; =\; frac\; \{sum\_\{i=1\}^\{N\}\; (x\_\{i\}\; -\; ar\{x\})(y\_\{i\}\; -\; ar\{y\})\; \}\; \{sum\_\{i=1\}^\{N\}\; (x\_\{i\}\; -\; ar\{x\})\; ^2\}$

: $hat\{a\}\; =\; ar\{y\}\; -\; hat\{b\}\; ar\{x\}$

Ordinary least squares produces the following features:

# The line goes through the point $(ar\{X\},ar\{Y\})$.

# The sum of the residuals is equal to zero.

# The linear combination of the residuals in which the coefficients are the "x"-values is equal to zero.

# The estimates are unbiased.**Alternative formulas for the slope coefficient**There are alternative (and simpler) formulas for calculating $hat\{b\}$:

: $hat\{b\}\; =\; frac\; \{sum\_\{i=1\}^\{N\}\; \{(x\_\{i\}y\_\{i\})\}\; -\; N\; ar\{x\}\; ar\{y\; \{sum\_\{i=1\}^\{N\}\; (x\_\{i\})^2\; -\; N\; ar\{x\}^2\}\; =\; r\; frac\; \{s\_y\}\{s\_x\}$

Here, r is the correlation coefficient of X and Y, s

_{x}is the sample standard deviation of X and s_{y}is the sample standard deviation of Y.**Inference**Under the assumption that the error term is normally distributed, the estimate of the slope coefficient has a normal distribution with mean equal to

**b**and standard error given by:: $s\_\; hat\{b\}\; =\; sqrt\; \{\; frac\; \{sum\_\{i=1\}^N\; hat\{varepsilon\_i\}^2\; /(N-2)\}\; \{sum\_\{i=1\}^N\; (x\_i\; -\; ar\{x\})^2\}\; \}.$

A confidence interval for "b" can be created using a t-distribution with N-2 degrees of freedom:

: $[\; hat\{b\}\; -\; s\_\; hat\{b\}\; t\_\{N-2\}^*,hat\{b\}\; +\; s\_\; hat\{b\}\; t\_\{N-2\}^*]$

**Numerical example**Suppose we have the sample of points {(1,-1),(2,4),(6,3)}. The mean of X is 3 and the mean of Y is 2. The slope coefficient estimate is given by:

: $hat\{b\}\; =\; frac\; \{(1\; -\; 3)((-1)\; -\; 2)\; +\; (2\; -\; 3)(4\; -\; 2)\; +\; (6\; -\; 3)(3\; -\; 2)\}\; \{(1\; -\; 3)^2\; +\; (2\; -\; 3)^2\; +\; (6\; -\; 3)^2\; \}\; =\; 7/14\; =\; 0.5$

The standard error of the coefficient is 0.866. A 95% confidence interval is given by

: [0.5 − 0.866 × 12.7062, 0.5 + 0.866 × 12.7062] = [−10.504, 11.504] .

*Wikimedia Foundation.
2010.*

### Look at other dictionaries:

**Simple linear regression**— A regression analysis between only two variables, one dependent and the other explanatory. The New York Times Financial Glossary … Financial and business terms**simple linear regression**— A regression analysis between only two variables, one dependent and the other explanatory. Bloomberg Financial Dictionary … Financial and business terms**Linear regression**— Example of simple linear regression, which has one independent variable In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one… … Wikipedia**linear regression**— simple regression, finding a direct equation that fulfills or approaches the results of a sample in order to find simple variables that explain the result of the sample … English contemporary dictionary**linear regression**— noun the relation between variables when the regression equation is linear: e.g., y = ax + b • Syn: ↑rectilinear regression • Topics: ↑statistics • Hypernyms: ↑regression, ↑simple regression, ↑ … Useful english dictionary**Regression dilution**— is a statistical phenomenon also known as attenuation . Consider fitting a straight line for the relationship of an outcome variable y to a predictor variable x, and estimating the gradient (slope) of the line. Statistical variability,… … Wikipedia**Regression toward the mean**— In statistics, regression toward the mean (also known as regression to the mean) is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and a fact that may… … Wikipedia**Regression analysis**— In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable (response variable) and of one or more independent variables (explanatory… … Wikipedia**Linear least squares (mathematics)**— This article is about the mathematics that underlie curve fitting using linear least squares. For statistical regression analysis using least squares, see linear regression. For linear regression on a single variable, see simple linear regression … Wikipedia**Regression discontinuity design**— In statistics, econometrics, epidemiology and related disciplines, a regression discontinuity design (RDD) is a design that elicits the causal effects of interventions by exploiting a given exogenous threshold determining assignment to treatment … Wikipedia