Easy way to compute partial correlation with Python.


Partial correlation measures the strength of a relationship or degree of association between two variables, while controlling/removing the effect of one or more other variables.
Partial correlation is used when correlation coefficient will give misleading results if there is another, confounding, variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient.

Like the correlation coefficient, the partial correlation coefficient may take values in the range from –1 to 1 wiyh exactly the same interpretations:
- the value 1 tells a perfect positive linear relationship.
- the value – 1 tells a perfect negative correlation.
- the value 0 tells that there is no linear relationship.



Partial correlation - removing the effect of other variables:



import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats

# pip install pingouin
import pingouin as pg


# raw correlations
rmg = .7
rsg = .8
rms = .9

# partial correlations
rho_mg_s = (rmg - rsg*rms) / ( np.sqrt(1-rsg**2)*np.sqrt(1-rms**2) )
rho_sg_m = (rsg - rmg*rms) / ( np.sqrt(1-rmg**2)*np.sqrt(1-rms**2) )

print(rho_mg_s)
print(rho_sg_m)

OUT:
-0.07647191129018778
0.5461186812727504

Partial correlation calculation - 3 datasets:



N = 76

# correlated datasets
x1 = np.linspace(1,10,N) + np.random.randn(N)
x2 = x1 + np.random.randn(N)
x3 = x1 + np.random.randn(N)

# let's convert these data to a pandas frame
df = pd.DataFrame()
df['x1'] = x1
df['x2'] = x2
df['x3'] = x3

# compute the "raw" correlation matrix
cormatR = df.corr()
print(cormatR)

# print out one value
print(' ')
print(cormatR.values[1,0])

# partial correlation
pc = pg.partial_corr(df,x='x3',y='x2',covar='x1')
print(' ')
print(pc)

Partial correlation calculation from datasets

Visualizing the matrices - correlation VS partial correlation:



fig,ax = plt.subplots(1,2,figsize=(6,3))

# raw correlations
ax[0].imshow(cormatR.values,vmin=-1,vmax=1)
ax[0].set_xticks(range(3))
ax[0].set_yticks(range(3))

# add text 
for i in range(3):
    for j in range(3):
        ax[0].text(i,j,np.round(cormatR.values[i,j],2), horizontalalignment='center')

        
        
# partial correlations
partialCorMat = df.pcorr()
ax[1].imshow(partialCorMat.values,vmin=-1,vmax=1)
ax[1].set_xticks(range(3))
ax[1].set_yticks(range(3))

for i in range(3):
    for j in range(3):
        ax[1].text(i,j,np.round(partialCorMat.values[i,j],2), horizontalalignment='center')


plt.show()

correlation vs partial correlation



See also related topics: