Background Info
Regressions and Residuals When performing a regression analysis, a line of best fit is formed that attempts to represent the data, and the relationship between the variables in question. Unfortunately, this line does not always perfectly intercept all the data points. If it did intercept at every point, then the regression would be perfectly fit to the data, and this is rarely the case. Instead, we get an estimate of best fit, and the vertical distance between each data point and the line is referred to as the residual, or the error between the actual data point, and the predicted value at that position. In other words, the residual tells ushow off we are in our prediction. Understanding the errors between the predicted and actual values is an important step in any regression analysis, and the goal of this post is to provide an introduction into this topic.
Residuals is the difference between the observed value and the mean value that the model predicts for that observation. Residuals are useful showing how poorly a model represents the data, and more importantly, if the linear regression assumptions are met.
Why is this important?
Residuals is an important concept in regression analysis and is used to generate various diagnostic tools that help determine which type of regression model to use.
The project goals
Data
Analysis
The programming language Python was used in this project. The matplotlib and seaborn libraries were used
to visualize the data. Pandas was used to wrangle the data, while numpy and statsmodels
were used in the calculations and machine learning models.