Coefficient of Variation (CV)
What is the Coefficient of Variation?
lThe coefficient of variation measures variability in relation to the mean (or average) and is used to compare the relative dispersion in one type of data with the relative dispersion in another type of data. The data to be compared may be in the same units, in different units, with the same mean, or with different means.
When is the Coefficient of Variation Useful?
lSuppose you want to evaluate the relative dispersion of grades for two classes of students: Class A and Class B. The coefficient of variation can be used to compare these two groups and determine how the grade dispersion in Class A compares to the grade dispersion in Class B. This is one example of how the coefficient of variation can be applied.
Coefficient of Variation
lRelative variation rather than absolute variation such as standard deviation
lDefinition of C.V.
lUseful in comparing variation between two distributions
–Used particularly in comparing laboratory measures to identify those determinations with more variation
–Also used in QC analyses for comparing observers
Standard Deviation of the Mean (SE)
lThe standard deviation of the mean (often called the standard error) is a measure of the variation in means of repeated samples. It is defined as the standard deviation divided by the square root of the sample size: SE = To calculate the standard deviation of the mean, do the following:
–Calculate the standard deviation (s).
–Calculate the square root of the sample size (n).
–Divide the standard deviation by result of step 2.
Percentiles and Quartiles
lDefinition of Percentiles
–Given a set of n observations x1, x2,…, xn, the pth percentile P is value of X such that p percent or less of the observations are less than P and (100-p) percent or less are greater than P
–P10 indicates 10th percentile, etc.
lDefinition of Quartiles
–First quartile is P25
–Second quartile is median or P50
–Third quartile is P75
Inter-quartile Range
lBetter description of distribution than range
–Range of middle 50 percent of the distribution
lDefinition of Inter-quartile Range
–IQR = Q3 - Q1.
Percentiles
lA "percentile" shows how a single system may be compared to all other systems. Percentiles range from lowest (1) to highest (99) with the average equal to 50
lThe pth percentile (p ranges from 0 to 1) is a value so that roughly p% of the data is smaller and (100-p)% of the data is larger. Percentiles can be computed for ordinal, interval, or ratio data.
lThere are three steps for computing a percentile.
•Sort the data from low to high;
•Count the number of values (n);
•Select the p*(n+1) observation.
lIf p*(n+1) is not a whole number, then go halfway between the two adjacent numbers.
lIf p*(n+1) < 1, select the smallest observation.
lIf p*(n+1) > n, select the largest observation
Examples
lThe following data represents cotinine levels in saliva (nmol/l) after smoking. We want to compute the 50th percentile.
l73, 58, 67, 93, 33, 18, 147
•Sorted data: 18, 33, 58, 67, 73, 93, 147
•There are n=7 observations.
•Select 0.50*(7+1) = 4th observation.
lTherefore, the 50th percentile equals 67. Notice that there are three observations larger than 67 and three observations smaller than 67.
lSuppose we want to compute the 20th percentile. Notice that p*(n+1) = 0.20*(7+1)=1.6. This is not a whole number so we select halfway between 1st and 2nd observation or 25.5. (Some people see the 1.6 and think they have to go six tenths of the way to the second value. You can do this if you like, but I think life is too short to worry about such details.)
lSuppose we want to compute the 10th percentile. Since 0.10*(7+1)=0.8, we should select the smallest observation which is 18.
The five number summary
A five number summary uses percentiles to describe a set of data. The five number summary consists of
–MAX - the maximum value
–75% - the 75th percentile (3rd quartile)
–50% - the 50th percentile (2nd quartile or median)
–25% - the 25th percentile (1st quartile)
–MIN - the minimum value
lThe five number summary splits the data into four regions, each of which contains 25% of the data.
Summary
lIn practice, descriptive statistics play a major role
–Always the first 1-2 tables/figures in a paper
–Statistician needs to know about each variable before deciding how to analyze to answer research questions
lIn any analysis, 90% of the effort goes into setting up the data
–Descriptive statistics are part of that 90%
What is the Coefficient of Variation?
lThe coefficient of variation measures variability in relation to the mean (or average) and is used to compare the relative dispersion in one type of data with the relative dispersion in another type of data. The data to be compared may be in the same units, in different units, with the same mean, or with different means.
When is the Coefficient of Variation Useful?
lSuppose you want to evaluate the relative dispersion of grades for two classes of students: Class A and Class B. The coefficient of variation can be used to compare these two groups and determine how the grade dispersion in Class A compares to the grade dispersion in Class B. This is one example of how the coefficient of variation can be applied.
Coefficient of Variation
lRelative variation rather than absolute variation such as standard deviation
lDefinition of C.V.
lUseful in comparing variation between two distributions
–Used particularly in comparing laboratory measures to identify those determinations with more variation
–Also used in QC analyses for comparing observers
Standard Deviation of the Mean (SE)
lThe standard deviation of the mean (often called the standard error) is a measure of the variation in means of repeated samples. It is defined as the standard deviation divided by the square root of the sample size: SE = To calculate the standard deviation of the mean, do the following:
–Calculate the standard deviation (s).
–Calculate the square root of the sample size (n).
–Divide the standard deviation by result of step 2.
Percentiles and Quartiles
lDefinition of Percentiles
–Given a set of n observations x1, x2,…, xn, the pth percentile P is value of X such that p percent or less of the observations are less than P and (100-p) percent or less are greater than P
–P10 indicates 10th percentile, etc.
lDefinition of Quartiles
–First quartile is P25
–Second quartile is median or P50
–Third quartile is P75
Inter-quartile Range
lBetter description of distribution than range
–Range of middle 50 percent of the distribution
lDefinition of Inter-quartile Range
–IQR = Q3 - Q1.
Percentiles
lA "percentile" shows how a single system may be compared to all other systems. Percentiles range from lowest (1) to highest (99) with the average equal to 50
lThe pth percentile (p ranges from 0 to 1) is a value so that roughly p% of the data is smaller and (100-p)% of the data is larger. Percentiles can be computed for ordinal, interval, or ratio data.
lThere are three steps for computing a percentile.
•Sort the data from low to high;
•Count the number of values (n);
•Select the p*(n+1) observation.
lIf p*(n+1) is not a whole number, then go halfway between the two adjacent numbers.
lIf p*(n+1) < 1, select the smallest observation.
lIf p*(n+1) > n, select the largest observation
Examples
lThe following data represents cotinine levels in saliva (nmol/l) after smoking. We want to compute the 50th percentile.
l73, 58, 67, 93, 33, 18, 147
•Sorted data: 18, 33, 58, 67, 73, 93, 147
•There are n=7 observations.
•Select 0.50*(7+1) = 4th observation.
lTherefore, the 50th percentile equals 67. Notice that there are three observations larger than 67 and three observations smaller than 67.
lSuppose we want to compute the 20th percentile. Notice that p*(n+1) = 0.20*(7+1)=1.6. This is not a whole number so we select halfway between 1st and 2nd observation or 25.5. (Some people see the 1.6 and think they have to go six tenths of the way to the second value. You can do this if you like, but I think life is too short to worry about such details.)
lSuppose we want to compute the 10th percentile. Since 0.10*(7+1)=0.8, we should select the smallest observation which is 18.
The five number summary
A five number summary uses percentiles to describe a set of data. The five number summary consists of
–MAX - the maximum value
–75% - the 75th percentile (3rd quartile)
–50% - the 50th percentile (2nd quartile or median)
–25% - the 25th percentile (1st quartile)
–MIN - the minimum value
lThe five number summary splits the data into four regions, each of which contains 25% of the data.
Summary
lIn practice, descriptive statistics play a major role
–Always the first 1-2 tables/figures in a paper
–Statistician needs to know about each variable before deciding how to analyze to answer research questions
lIn any analysis, 90% of the effort goes into setting up the data
–Descriptive statistics are part of that 90%