Matthews (phi) correlation coefficient
by Filippo Biscarini
Brief theoretical background
Matthews correlation coefficient (MCC) –or \(\phi\) (phi) coefficient– is a measure of association between two binary variables (0/1), hence a metric we can use to assess the performance of a binary classification model.
Based on the confusion matrix (TP: # true positives; TN: # true negatives; FP: # false positives; FN: # false negatives), the MCC is calculated as:
predictions | Observed 1 | Observed 0 |
---|---|---|
1 | TP | FP |
0 | FN | TN |
MCC values range from \(-1\) to \(1\):
- \(-1\): total disagreement between predicted classes and actual classes
- \(0\): complete random guessing (no predictive ability)
- \(1\): total agreement between predicted classes and actual classes
Like other metrics (AUC, Cohen’s kappa), MCC is particularly useful when the two classes are imbalanced, a circumstance where the mere overall accuracy would give a distorted idea of the predictive performance.
Numerical illustration in R
Let’s first write a function that calculates the \(\phi\) coefficient:
mcc = function(tp,fp,tn,fn) {
phi =(tp*tn - fp*fn) / sqrt((tp+fp)*(fn+tn)*(tp+fn)*(fp+tn))
return(phi)
}
Let’s say that we want to identify carriers of a rare harmful recessive mutation (frequency = \(1\%\) in the population) by using a small set of genetic markers (e.g. SNPs) measured on a hair/skin/saliva sample. We develop a classifier and obtain the results below:
TP = 18
FP = 7
FN = 25
TN = 375
phi = mcc(TP,FP,TN,FN)
The MCC (\(\phi\)) from this example is 0.513, which reflects moderate predictive ability.
In this case the overall classification accuracy would be 0.925 (calculations below).
correct_predictions = TP+TN
total_predictions = TP+TN+FP+FN
accuracy = correct_predictions/total_predictions
This accuracy is quite high, and one would be misled to believe that the developed classifier (classification model) has a very good predictive ability. However, this is an unbalanced classification problem, and the MCC gives a more adequate idea of the predictive ability of the model.
Relationship between MCC and Pearson correlation coefficient
The \(\phi\) coefficient is mathematically equivalent to the Pearson correlation coefficient calculated on dichotomous variables (0/1: as is the case for binary observations and predicted classes). Pearson correlation would be calculated as below, based on the covariance between predictions and observations and their variances (the following property of expected values is used: \(Cov(x,y)=\mathbb{E}(xy) - \mathbb{E}(x)\mathbb{E}(y)\)):
\[\rho(y.\hat{y}) = \frac{Cov(y,\hat{y})}{\sqrt{Var(y) \cdot Var(\hat{y})}} = \frac{(y-\mathbb{E}(y)) \cdot (\hat{y}-\mathbb{E}(\hat{y}))}{\sqrt{Var(y) \cdot Var(\hat{y})}} = \frac{\mathbb{E}(y\hat{y}) - \mathbb{E}(y) \cdot \mathbb{E}(\hat{y})}{\sqrt{Var(y) \cdot Var(\hat{y})}}\]Now:
- \(\mathbb{E}(y\hat{y}) = \frac{TP}{N}\) (where \(N\) is the sample size): this comes from TP being the intersection between \(y=1\) and \(\hat{y}=1\)
- \(\mathbb{E}(y) = \frac{TP+FN}{N}\): this is the fraction of all “positive” observations (\(y=1\))
- \(\mathbb{E}(\hat{y)} = \frac{TP+FP}{N}\): this is the fraction of all “positive” predictions (\(\hat{y}=1\))
Therefore:
\[Cov(y,\hat{y}) = TP \cdot N - (TP+FN) \cdot (TP+FP)\]The variance of binomially distributed variables is \(n \cdot p \cdot (1-p)\), hence: \(Var(y) = (TP+FN) \cdot (FP+TN)\), and \(Var(\hat{y}) = (TP+FP) \cdot (FN+TN)\).
Finally:
\[\rho(y,\hat{y}) = \frac{TP \cdot N - (TP+FN) \cdot (TP+FP)}{\sqrt{(TP+FN) \cdot (FP+TN) \cdot (TP+FP) \cdot (FN+TN)}}\]And this is equivalent to the calculation of Matthews correlation coefficient (\(\phi\) coefficient: see here for more details, alternative mathematical expression).
Numerical proof of equivalence
We generate two random vectors of 0s and 1s: observed and predicted binary classes.
set.seed(123)
y = sample(x = 0:1, size = 20, replace = TRUE)
y_hat = sample(x = 0:1, size = 20, replace = TRUE)
We then calculate the Pearson linear correlation between the two vectors: the correlation is 0.082.
cor(y,y_hat)
We can compare this with the MCC calculated on the same data:
- first, we load the
R
librarymltools
, which includes a function to calculate MCC - we prepare the confusion matrix from the vectors of observed and predicted classes
- finally, we use the
mcc()
function to calculate the MCC
The MCC is equivalent to the Pearson correlation coefficient: 0.082
library("mltools")
conf_matrix = table(y_hat,y)
conf_matrix = matrix(conf_matrix, ncol = (ncol(conf_matrix)))
mltools::mcc(confusionM = conf_matrix)
Related topics
MCC is one of many metrics that can be used to evaluate the performance of binary classification models when the data are imbalanced. Examples include ROC-AUC and Cohen’s kappa, which may be the topics of future posts. Also, the extension of MCC –and related metrics– to multi-class classification problems is possible, and will be treated in future posts.
home