cover

R2 metric for regression

March 12, 2025 6 min read

In this post I derive the coefficifent of determination (R2) metric for regression, explain its interpretations, connection to explained variance etc.

R2R^2: Coefficient of determination

There are 2 main classes of supervised learning problems in ML: classification and regression. In the regression problem we are given a training set of data, where we have a data matrix XtrainX^{train} (which consists of rows XitrainX^{train}_i) and a corresponding vector ytrainy^{train} of targets yitrainy^{train}_i. We need to train a regression model, such that given a test data XtestX^{test}, this model produces the best approximations y^test\hat{y}^{test} of the true ytesty^{test}.

How to quantify, if the regression model is good or bad? The main metric in regression problem is the coefficient of determination R2R^{2}.

Let’s say that true values yiy_{i} are sums of their approximations y^i\hat{y}_i and error terms eie_i:

yi=y^i+eiy_{i} = \hat{y}_{i} + e_{i}

As it is often done in ML, the coefficient of determination R2R^{2} is derived through decomposition of variance into terms: let us decompose variance of target Var(y)Var(y) as a sum of variances of target approximations, variances of errors and a covariance term:

Var(y)=Var(y^)+Var(e)+2Cov(y^,e)Var(y) =Var( \hat{y}) + Var(e) + 2Cov(\hat{y},e)

Good enough models should achieve absence of covariance between approximations and errors in approximations: Cov(y^,e)=0Cov(\hat{y},e)=0. Given this fact, we simplify the expression above to just:

Var(y)=Var(y^)+Var(e)Var(y) = Var( \hat{y}) + Var(e)

Now recall the definition of variance Var(y)=i(yiyˉ)2Var(y) = \sum_i (y_i - \bar{y})^2, where yˉ\bar{y} is the expectation (mean) of yiy_i, and substitute it into the expression above:

1N(yiyˉ)2=1N(y^iy^ˉ)+1N(eieˉ)\frac{1}{N}\sum ( y_{i} - \bar{y})^{2}=\frac{1}{N} \sum ( \hat{y}_{i} - \bar{\hat{y}}) + \frac{1}{N} \sum (e_{i} - \bar{ e})

Again, our model is assumed to be unbiased and have an expectation of error of 0: eˉ=0\bar{e}=0. Then:

(yiyˉ)2=((y^i+ei)yˉ)2=(y^iy^ˉ)22(y^iy^ˉ)+ei2=(y^iy^ˉ)2+ei2\sum ( y_{i} - \bar{y})^{2} = \sum ( (\hat{y}_i + e_i) - \bar{y})^2 = \sum (\hat{y}_i - \bar{\hat{y}})^2 - \cancel{ 2 \sum (\hat{y}_i - \bar{\hat{y}})} + \sum e_i^2 = \sum ( \hat{y}_{i} - \bar{\hat{y}})^{2} + \sum e^{2}_{i}.

Now we introduce several terms, used in R2R^2 definition:

(yiyˉ)2=TSS=Total Sum Squared\sum ( y_{i} - \bar{y})^{2} = TSS = \text{Total Sum Squared} - variance of true targets

(y^iy^ˉ)2=ESS=Explained Sum Squared\sum ( \hat{y}_{i} - \bar{\hat{y}})^{2} = ESS = \text{Explained Sum Squared} - variance of target approximations

(ei2)=SSR=Sum of Squared Residuals\sum (e^{2}_{i}) = SSR = \text{Sum of Squared Residuals} - variance of errors

The coefficient of determination R2R^{2} is defined as the ratio of Explained Sum Squared (ESS) to Total Sum Squared (TSS).

R2=ESSTSS=1SSRTSS=1(ei2)(yiyˉ)2R^{2} = \frac{ESS}{TSS} = 1 - \frac{SSR}{TSS} = 1 - \frac{\sum (e^{2}_{i}) }{\sum ( y_{i} - \bar{y})^{2} }

This is the main metric of model quality in regression problems. The closer it is to 1, the better. In the next section we will show that it actually can be re-interpreted as a square of Pearson’s correlation between target and its approximation.

R2R^2 as squared correlation

The coefficient of determination R2R^2 equals the squared Pearson correlation coefficient between the predicted values y^\hat{y} and actual values yy. Let us prove this fact.

We start with the definition of coefficient of determination R2R^2:

R2=1SSRTSS=1i(yiy^i)2i(yiyˉ)2R^2 = 1 - \frac{ SSR }{ TSS } = 1 - \frac{\sum_i (y_i - \hat{y}_i)^2}{\sum_i (y_i - \bar{y})^2}

We need to show that it equals to the square of the Pearson correlation (rr) between yy and y^\hat{y}.

The Pearson correlation coefficient (rr) between yy and y^\hat{y} is:

ryy^=Cov(y,yˉ)Var(y)Var(y^)=i(yiyˉ)(y^iy^ˉ)i(yiyˉ)2i(y^iy^ˉ)2r_{y\hat{y}} = \frac{Cov(y, \bar{y})}{\sqrt{ Var(y) \cdot Var(\hat{y}) }} = \frac{\sum_i (y_i - \bar{y})(\hat{y}_i - \bar{\hat{y}})}{\sqrt{\sum_i (y_i - \bar{y})^2 \sum_i (\hat{y}_i - \bar{\hat{y}})^2}}

ryy^2=Cov2(y,y^)Var(y)Var(y^)r^2_{y\hat{y}} = \frac{Cov^2(y, \hat{y})}{ Var(y)Var(\hat{y}) }

ryy^2=Cov2(y^+e,y^)Var(y)Var(y^)r^2_{y\hat{y}} = \frac{Cov^2(\hat{y} + e, \hat{y})}{ Var(y)Var(\hat{y}) }

ryy^2=Cov(y^+e,y^)Cov(y^+e,y^)Var(y)Var(y^)r^2_{y\hat{y}} = \frac{Cov(\hat{y} + e, \hat{y}) Cov(\hat{y} + e, \hat{y})}{ Var(y)Var(\hat{y}) }

ryy^2=(Cov(y^,y^)+Cov(e,y^))(Cov(y^,y^)+Cov(e,y^))Var(y)Var(y^)r^2_{y\hat{y}} = \frac{(Cov(\hat{y}, \hat{y}) + Cov(e, \hat{y})) (Cov(\hat{y}, \hat{y}) + Cov(e, \hat{y})) }{ Var(y)Var(\hat{y}) }

ryy^2=Cov2(y^,y^)Var(y)Var(y^)r^2_{y\hat{y}} = \frac{Cov^2(\hat{y}, \hat{y}) }{ Var(y)Var(\hat{y}) }

ryy^2=Var2(y^)Var(y)Var(y^)r^2_{y\hat{y}} = \frac{Var^2(\hat{y}) }{ Var(y)Var(\hat{y}) }

ryy^2=Var(y^)Var(y)=(y^iy^ˉ)2(yiyˉ)2=(yiyˉ)2ei2(yiyˉ)2=1(yiy^i)2(yiyˉ)2=R2r^2_{y\hat{y}} = \frac{Var(\hat{y}) }{ Var(y) } = \frac{\sum (\hat{y}_i - \bar{\hat{y}})^2}{ \sum (y_i - \bar{y})^2 } = \frac{ \sum (y_i - \bar{y})^2 - \sum e_i^2 }{ \sum (y_i - \bar{y})^2 } = 1 - \frac{\sum (y_i - \hat{y}_i)^2 }{ \sum (y_i - \bar{y})^2 } = R^2 (third equality here was proved in previous section)

Connection to explained variance

In the text above we used assumption that the model is unbiased and the expectation of error is 0.

Explained variance is exactly the same as coefficient determination, if this condition holds.

This must not be the case in general. If this is not true, we need to subtract expectation of the model in order to get explained variance.

TODO

References:


Boris Burkov

Written by Boris Burkov who lives in Moscow, Russia, loves to take part in development of cutting-edge technologies, reflects on how the world works and admires the giants of the past. You can follow me in Telegram