|
|
|
The correlation coefficient, usually denoted by r, is a measure of the strength of the linear association between two variables (or the strength of the clustering of the data points around a line). Before developing the formula for the correlation coefficient, we look at a graphical interpretation. Look carefully at the scatter plots and corresponding correlation coefficients shown below.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
To get a better feel for how correlation coefficients measure the strength of association between two variables, try the game Guessing Correlations. (This is a Java applet which might take a minute to load. Be patient.)
If the correlation coefficient for two variables is positive, what will the scatter plot look like? What if the correlation coefficient is negative?
What do the pictures suggest about data with a correlation coefficient near +1 or -1? What about data with a correlation coefficient near 0?
Look back at the scatter plots you made in Part 1. Estimate the correlation coefficient for Test 1 and Test 2 scores. Do the same for Test 2 and Test 3 scores.
To compute the correlation coefficient for two variables x and y, we first convert each value for x and each value for y into standard units, then take the average of their products. The formula is
In practice, there is an easier way to compute the correlation coefficient by hand, namely via the formula
Show that the two calculations (one in each of the preceding paragraphs) for the correlation coefficient are the same. [ Hint ] Why is the second one easier to compute by hand? [ Hint ]
Use a formula for r to compute the correlation coefficient for the list x and y below. Record your calculations in the place provided in your Helper Application Worksheet.
x | 1 | 2 | 3 | 4 |
y | 2 | 3 | 4 | 3 |
Use a scatter plot of the data to decide if the correlation coefficient you computed makes sense. Explain the relationship between your scatter plot and r.
In your helper application worksheet, you will find a short procedure for computing the correlation coefficient via one of the formulas above. Which of the formulas does the procedure employ? Use the procedure to check the computation you made in Question 6.
Use your helper application to compute the correlation coefficient for the Test 1 and Test 2 scores, and for the Test 2 and Test 3 scores, from Part 1. How close were the estimates you made above?
|
|
|
modules at math.duke.edu
Copyright CCP and the author(s), 1999