Correlation ratio

In statistics, the correlation ratio is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample. The measure is defined as the ratio of two standard deviations representing these types of variation. The context here is same as that of the intraclass correlation coefficient, whose value is the square of the correlation ratio.
Contents
Definition
Suppose each observation is y_{xi} where x indicates the category that observation is in and i is the label of the particular observation. Let n_{x} be the number of observations in category x and
 and
where is the mean of the category x and is the mean of the whole population. The correlation ratio η (eta) is defined as to satisfy
which can be written as
i.e. the weighted variance of the category means divided by the variance of all samples.
It is worth noting that if the relationship between values of and values of is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the square of the correlation coefficient, otherwise the correlation ratio will be larger in magnitude. It can therefore be used for judging nonlinear relationships.
Range
The correlation ratio η takes values between 0 and 1. The limit η = 0 represents the special case of no dispersion among the means of the different categories, while η = 1 refers to no dispersion within the respective categories. Note further, that η is undefined when all data points of the complete population take the same value.
Example
Suppose there is a distribution of test scores in three topics (categories):
 Algebra: 45, 70, 29, 15 and 21 (5 scores)
 Geometry: 40, 20, 30 and 42 (4 scores)
 Statistics: 65, 95, 80, 70, 85 and 73 (6 scores).
Then the subject averages are 36, 33 and 78, with an overall average of 52.
The sums of squares of the differences from the subject averages are 1952 for Algebra, 308 for Geometry and 600 for Statistics, adding to 2860, while the overall sum of squares of the differences from the overall average is 9640. The difference between these of 6780 is also the weighted sum of the square of the differences between the subject averages and the overall average:
 5(36 − 52)^{2} + 4(33 − 52)^{2} + 6(78 − 52)^{2} = 6780
This gives
suggesting that most of the overall dispersion is a result of differences between topics, rather than within topics. Taking the square root
Observe that for η = 1 the overall sample dispersion is purely due to dispersion among the categories and not at all due to dispersion within the individual categories. For a quick comprehension simply imagine all Algebra, Geometry, and Statistics scores being the same respectively, e.g. 5 times 36, 4 times 33, 6 times 78.
The limit η = 0 refers to the case without dispersion in the categories contributing to the overall dispersion. The trivial requirement for this extreme is that all category means are the same.
Pearson v. Fisher
The correlation ratio was introduced by Karl Pearson as part of analysis of variance. Ronald Fisher commented:
As a descriptive statistic the utility of the correlation ratio is extremely limited. It will be noticed that the number of degrees of freedom in the numerator of η^{2} depends on the number of the arrays^{[1]}
to which Egon Pearson (Karl's son) responded by saying
Again, a longestablished method such as the use of the correlation ratio [§45 The "Correlation Ratio" η] is passed over in a few words without adequate description, which is perhaps hardly fair to the student who is given no opportunity of judging its scope for himself.^{[2]}
References
 ^ Ronald Fisher (1926) Statistical Methods for Research Workers, ISBN 0050021702 (excerpt)
 ^ Pearson E.S. (1926) "Review of Statistical Methods for Research Workers (R. A. Fisher)", Science Progress, 20, 733734. (excerpt)
Categories: Covariance and correlation
 Statistical ratios
Wikimedia Foundation. 2010.
Look at other dictionaries:
correlation ratio — noun : a number other than the correlation coefficient that measures the degree of correlation between two mathematical variables * * * Statistics. the ratio of the variance between arrays of data within a sample to the variance of the whole… … Useful english dictionary
correlation ratio — koreliacijos santykis statusas T sritis fizika atitikmenys: angl. correlation ratio vok. Korrelationsverhältnis, n rus. корреляционное отношение, n pranc. rapport de corrélation, m … Fizikos terminų žodynas
correlation ratio — /kɒrəˈleɪʃən ˌreɪʃioʊ/ (say koruh layshuhn .raysheeoh) noun a mathematical measure of the correlation between two sets of values not linearly correlated … Australian English dictionary
correlation ratio — Statistics. the ratio of the variance between arrays of data within a sample to the variance of the whole sample. * * * … Universalium
Correlation and dependence — This article is about correlation and dependence in statistical data. For other uses, see correlation (disambiguation). In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation … Wikipedia
Correlation — In probability theory and statistics, correlation, (often measured as a correlation coefficient), indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co relation … Wikipedia
ratio — I noun amount, balance, correlation, degree, differential, fixed relation, measure, percentage, perspective, proportion, proportional relation, proportionality, quota, range, relative estimate, relative quantity, scale, share, standard II index… … Law dictionary
ratio — [n] percentage, relation of part to whole arrangement, correlation, correspondence, equation, fraction, proportion, proportionality, quota, quotient, rate, relationship, scale; concept 768 Ant. whole … New thesaurus
Ratio distribution — A ratio distribution (or quotient distribution ) is a statistical distribution constructed as the distribution of the ratio of random variables having two other distributions.Given two stochastic variables X and Y , the distribution of the… … Wikipedia
Correlation swap — A correlation swap is an over the counter financial derivative that allows one to speculate on or hedge risks associated with the observed average correlation, of a collection of underlying products, where each product has periodically observable … Wikipedia