15 March 2023

Matthews (phi) correlation coefficient

by Filippo Biscarini

Brief theoretical background

Matthews correlation coefficient (MCC) –or \(\phi\) (phi) coefficient– is a measure of association between two binary variables (0/1), hence a metric we can use to assess the performance of a binary classification model.

Based on the confusion matrix (TP: # true positives; TN: # true negatives; FP: # false positives; FN: # false negatives), the MCC is calculated as:

predictions	Observed 1	Observed 0
1	TP	FP
0	FN	TN

\[\phi = \frac{(TP \cdot TN - FP \cdot FN)}{\sqrt{(TP+FP) \cdot (TP+FN) \cdot (TN+FP) \cdot (TN+FN)}}\]

MCC values range from \(-1\) to \(1\):

\(-1\): total disagreement between predicted classes and actual classes
\(0\): complete random guessing (no predictive ability)
\(1\): total agreement between predicted classes and actual classes

Like other metrics (AUC, Cohen’s kappa), MCC is particularly useful when the two classes are imbalanced, a circumstance where the mere overall accuracy would give a distorted idea of the predictive performance.

Numerical illustration in R

Let’s first write a function that calculates the \(\phi\) coefficient:

mcc = function(tp,fp,tn,fn) {
  phi =(tp*tn - fp*fn) / sqrt((tp+fp)*(fn+tn)*(tp+fn)*(fp+tn))
  return(phi)
}

Let’s say that we want to identify carriers of a rare harmful recessive mutation (frequency = \(1\%\) in the population) by using a small set of genetic markers (e.g. SNPs) measured on a hair/skin/saliva sample. We develop a classifier and obtain the results below:

TP = 18
FP = 7
FN = 25
TN = 375

phi = mcc(TP,FP,TN,FN)

The MCC (\(\phi\)) from this example is 0.513, which reflects moderate predictive ability.

In this case the overall classification accuracy would be 0.925 (calculations below).

correct_predictions = TP+TN
total_predictions = TP+TN+FP+FN
accuracy = correct_predictions/total_predictions

This accuracy is quite high, and one would be misled to believe that the developed classifier (classification model) has a very good predictive ability. However, this is an unbalanced classification problem, and the MCC gives a more adequate idea of the predictive ability of the model.

Relationship between MCC and Pearson correlation coefficient

The \(\phi\) coefficient is mathematically equivalent to the Pearson correlation coefficient calculated on dichotomous variables (0/1: as is the case for binary observations and predicted classes). Pearson correlation would be calculated as below, based on the covariance between predictions and observations and their variances (the following property of expected values is used: \(Cov(x,y)=\mathbb{E}(xy) - \mathbb{E}(x)\mathbb{E}(y)\)):

\[\rho(y.\hat{y}) = \frac{Cov(y,\hat{y})}{\sqrt{Var(y) \cdot Var(\hat{y})}} = \frac{(y-\mathbb{E}(y)) \cdot (\hat{y}-\mathbb{E}(\hat{y}))}{\sqrt{Var(y) \cdot Var(\hat{y})}} = \frac{\mathbb{E}(y\hat{y}) - \mathbb{E}(y) \cdot \mathbb{E}(\hat{y})}{\sqrt{Var(y) \cdot Var(\hat{y})}}\]

Now:

\(\mathbb{E}(y\hat{y}) = \frac{TP}{N}\) (where \(N\) is the sample size): this comes from TP being the intersection between \(y=1\) and \(\hat{y}=1\)
\(\mathbb{E}(y) = \frac{TP+FN}{N}\): this is the fraction of all “positive” observations (\(y=1\))
\(\mathbb{E}(\hat{y)} = \frac{TP+FP}{N}\): this is the fraction of all “positive” predictions (\(\hat{y}=1\))

Therefore:

\[Cov(y,\hat{y}) = TP \cdot N - (TP+FN) \cdot (TP+FP)\]

The variance of binomially distributed variables is \(n \cdot p \cdot (1-p)\), hence: \(Var(y) = (TP+FN) \cdot (FP+TN)\), and \(Var(\hat{y}) = (TP+FP) \cdot (FN+TN)\).

Finally:

\[\rho(y,\hat{y}) = \frac{TP \cdot N - (TP+FN) \cdot (TP+FP)}{\sqrt{(TP+FN) \cdot (FP+TN) \cdot (TP+FP) \cdot (FN+TN)}}\]

And this is equivalent to the calculation of Matthews correlation coefficient (\(\phi\) coefficient: see here for more details, alternative mathematical expression).

Numerical proof of equivalence

We generate two random vectors of 0s and 1s: observed and predicted binary classes.

set.seed(123)
y = sample(x = 0:1, size = 20, replace = TRUE)
y_hat = sample(x = 0:1, size = 20, replace = TRUE)

We then calculate the Pearson linear correlation between the two vectors: the correlation is 0.082.

cor(y,y_hat)

We can compare this with the MCC calculated on the same data:

first, we load the R library mltools, which includes a function to calculate MCC
we prepare the confusion matrix from the vectors of observed and predicted classes
finally, we use the mcc() function to calculate the MCC

The MCC is equivalent to the Pearson correlation coefficient: 0.082

library("mltools")
conf_matrix = table(y_hat,y)
conf_matrix = matrix(conf_matrix, ncol = (ncol(conf_matrix)))

mltools::mcc(confusionM = conf_matrix)

MCC is one of many metrics that can be used to evaluate the performance of binary classification models when the data are imbalanced. Examples include ROC-AUC and Cohen’s kappa, which may be the topics of future posts. Also, the extension of MCC –and related metrics– to multi-class classification problems is possible, and will be treated in future posts.

home

The Bioinformateachers

Learn data, learn life.

Matthews (phi) correlation coefficient

Brief theoretical background

Numerical illustration in R

Relationship between MCC and Pearson correlation coefficient

Numerical proof of equivalence

Matthews (phi) correlation coefficient

Brief theoretical background

Numerical illustration in R

Relationship between MCC and Pearson correlation coefficient

Numerical proof of equivalence

Related topics