Go to CCP Homepage Go to Materials Page Go to Table of Contents
Go Back One Page Go Forward One Page

Linear Correlation and Regression

Part 2: Correlation

The correlation coefficient, usually denoted by r, is a measure of the strength of the linear association between two variables (or the strength of the clustering of the data points around a line). Before developing the formula for the correlation coefficient, we look at a graphical interpretation. Look carefully at the scatter plots and corresponding correlation coefficients shown below.

 
 

To get a better feel for how correlation coefficients measure the strength of association between two variables, try the game Guessing Correlations. (This is a Java applet which might take a minute to load. Be patient.)

  1. If the correlation coefficient for two variables is positive, what will the scatter plot look like? What if the correlation coefficient is negative?

  2. What do the pictures suggest about data with a correlation coefficient near +1 or -1? What about data with a correlation coefficient near 0?

  3. What correlation would you expect for perfectly linear data (all of the data points lie on a line)?

  4. Look back at the scatter plots you made in Part 1. Estimate the correlation coefficient for Test 1 and Test 2 scores. Do the same for Test 2 and Test 3 scores.

To compute the correlation coefficient for two variables x and y, we first convert each value for x and each value for y into standard units, then take the average of their products. The formula is

In practice, there is an easier way to compute the correlation coefficient by hand, namely via the formula

  1. Show that the two calculations (one in each of the preceding paragraphs) for the correlation coefficient are the same. [ Hint ] Why is the second one easier to compute by hand? [ Hint ]

  2. Use a formula for r to compute the correlation coefficient for the list x and y below. Record your calculations in the place provided in your Helper Application Worksheet.

  3.   x   1 2 3 4
      y   2 3 4 3

  4. Use a scatter plot of the data to decide if the correlation coefficient you computed makes sense. Explain the relationship between your scatter plot and r.

  5. In your helper application worksheet, you will find a short procedure for computing the correlation coefficient via one of the formulas above. Which of the formulas does the procedure employ? Use the procedure to check the computation you made in Question 6.

  6. Use your helper application to compute the correlation coefficient for the Test 1 and Test 2 scores, and for the Test 2 and Test 3 scores, from Part 1. How close were the estimates you made above?

Go to CCP Homepage Go to Materials Page Go to Table of Contents
Go Back One Page Go Forward One Page

| CCP Home | Materials | Test Modules | Contents | Back | Forward |

modules at math.duke.edu Copyright CCP and the author(s), 1999