9 Correlations

Correlations8787 Remember, Correlation does not imply causation! explore how two or more variables are related to each other. It attempts to assess if the changes in one variable systematically vary with the changes in another. There is no dependence relationship between variables (e.g., IV/DV). Most often correlations are used to look at how variables are correlated to each other in a data set, usually with a focus on the variable(s) of interest.

This section explores the three most common correlation tests, one parametric (Pearson) and two non-parametric (Spearman and Kendall). All three tests compute a correlation coefficient that can range between -1 and 1. The closer the value is to the extreme (-1 or 1) the stronger the relationship is.

< 0 - indicates a negative correlation, meaning that the as the value of x increases, the value of y decreases.
0 - indicates no association.
> 0 - indicates a positive correlation, meaning that as the value of x increases, the value of y increases as well.

Null hypothesis (H₀): There is no correlation between the two variables. In this case the correlation coefficient (which depending on test can be r, \(\varrho\), or \(\tau\)) is zero or close to zero.

The data set used in the examples below is called mtcars and is available in R example datasets. The data, covering 11 variables describing cars, was extracted from the 1974 Motor Trend US magazine (Table 9.1).

Table 9.1: Description of variables in the mtcars dataset

Variable	Description
mpg	Miles/gallon (US)
cyl	Number of cylinders
disp	Displacement (in cubic inches)
hp	Horsepower
drat	Rear axle ratio
wt	Weight (in 1000 lb)
qsec	1/4 mile time
vs	Engine (0-V, 1-Line)
am	Transmission (0-automatic, 1-manual)
gear	Number of forward gears
carb	Number of carburators

The first few rows of the data set are shown in Table 9.2.

Table 9.2: First few rows of the mtcars dataset

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

The question asked is: Is there any relationship between mpg and wt? That is, is there any correlation between the car’s weight and its fuel efficiency?

9.1 Pearson Correlations Test

Parametric test used for two interval or ratio variables. The test requires the following assumptions to be met:

Data is bi-variate normal;
The relationship between variables is linear.

Before running the test, we verify the assumptions for the Pearson Correlations8888 An issue with the Pearson correlations test is the fact that outliers can negatively impact the test results. Therefore, an analysis of outliers should be performed for the sample before the test. test.

To test normality we use the Shapiro-Wilk test.


    Shapiro-Wilk normality test

data:  my.cor$mpg
W = 0.95, p-value = 0.1


    Shapiro-Wilk normality test

data:  my.cor$wt
W = 0.94, p-value = 0.09

The results show that the distribution of data for both variables is not significantly different from the normal distribution (both p-values are > 0.05), thus verifying the assumption of normality.

Besides using Shapiro-Wilks, data normality can be analyzed using histograms, q-q plots, and the values of skewness and kurtosis for each data set. These alternative ways of studying a data set’s normality are exemplified for other analyses.

The linearity assumption can be visualized by generating a scatter plot representation with one variable on the X axis and the other variable on the Y axis.

Figure 9.1: Scatterplot of miles/gallon (mpg) vs. weight (wt)

Looking at the scatter plot (Figure 9.1) the assumption of linearity seems to hold because the relationship between the two variables seems to be linear along the red line (regression line). Should the pattern of points show a different trend (e.g., curve), the relationship between the two variables is not linear and therefore other correlation tests should be used to analyze it.

With the assumptions verified, let’s run the Pearson Correlations test.


    Pearson's product-moment correlation

data:  my.cor$mpg and my.cor$wt
t = -9.6, df = 30, p-value = 1e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.9338 -0.7441
sample estimates:
    cor 
-0.8677

The p-value < 0.05 suggests that there is a significant correlation between mpg (fuel efficiency) and wt (car’s weight).

9.2 Spearman Correlations Test

Non-parametric test used for two interval, ratio, or ordinal type variables.

Because the Spearman Correlations test does not have any assumptions about the data, it can be run directly.


    Spearman's rank correlation rho

data:  my.cor$mpg and my.cor$wt
S = 10000, p-value = 1e-11
alternative hypothesis: true rho is not equal to 0
sample estimates:
    rho 
-0.8864

Based on the computed p-value < 0.05 it can be concluded that the two variables are significantly correlated to each other.

9.3 Kendall Correlations Test

Non-parametric test used for two interval, ratio, or ordinal type variables.

Because the Kendall Correlations test does not have any assumptions about the data, it can be run directly.


    Kendall's rank correlation tau

data:  my.cor$mpg and my.cor$wt
z = -5.8, p-value = 7e-09
alternative hypothesis: true tau is not equal to 0
sample estimates:
    tau 
-0.7278

Based on the computed p-value < 0.05 it can be concluded that the two variables are significantly correlated to each other.