In this article, we will be discussing r-squared and adjusted r-squared, its differences, its indication, formula, and more.
The R-squared is a statistical measure that represents the proportion of the variance in a regression model for a dependent variable that is defined by an independent variable or variables. It’s a metric for determining how far or close the data is from the fitted regression line. In other words, a linear model explains a proportion of the variation in the response variable, which we call the r-squared.
Some other words for R-squared are:
The following is the formula for calculating r-squared:
where:
We can calculate the R-squared in the following way:
Firstly, you need be aware that r squared is always between 0 and 100%.
The 0 percent shows that the model explains none or zero portion of the variability in the response data around its mean. Whereas 100% means that the model explains all of the variability in the response data around its mean.
If the r-squared is greater, it signifies that the model perfectly reflects your data. A smaller r-squared, on the other hand, suggests that the model does not completely reflect your data.
Now, I’m sure you’re thinking that a high r squared is what you need to aim for because it’s a positive indicator. However, a high r-squared does not automatically imply that the regression model is excellent. The kind of variables in the model, the units of measurement of the variables, and the data transformation done all have an effect on the statistical measure’s quality.
The following are some advantages of r-squared:
The following are some limitations of r-squared:
An adjusted R-squared is a refined version of R-squared that takes into consideration factors in a regression model that are not really significant. In simple words, the adjusted R-squared indicates whether or not adding more factors will improve a regression model. Basically, it determines whether or not those additional factors contributes to the regression model. It tests the predictive power of regression models with varying levels of predictors. An adjusted r squared helps in comparing the goodness-of-fit of regression models with different numbers of independent variables. The goodness-of-fit testing is an important hypothesis test that determines how well sample data fits a normal distribution from a population. One of the most prevalent forms of goodness-of-fit tests is the chi-square test.
The following is the formula to calculate the adjusted r-squared:
where:
An adjusted r-squared can be determined based on the r-squared value, the number of independent variables, and the total sample size.
You’re probably wondering if an adjusted r-squared increases or decreases on its own, and if so, what may be the cause. Basically, an adjusted R-squared only increases if the new predictor improves the model beyond what might be predicted by accident. When a predictor improves the model less than what is actually predicted by accident, it will fall or diminish. Let me explain this in simple terms, if you add way too many variables which are useless, to a model, the adjusted r-squared will decrease. However, if you add a bunch of variables that are actually useful, to a model, then the adjusted r-squared will increase. An adjusted r squared is always going to be either less than or equal to r-squared.
The adjusted r-squared takes into account and evaluates different independent factors, whereas the r-squared does not. We just spoke about how an adjusted r squared rises when useful variables are added to the model, and vice versa. Always keep in mind that the r-squared, on the other hand, rises with each predictor added to a model. Unlike an adjusted r-squared, the r-squared never decreases. The more variables you add to the model, the better it will fit.