Credit Score and other factors that influence loan approval including years of credit history
Executive Summary
The main aim of this report is to evaluate the relationship between Credit Score and other factors that influence loan approval including years of credit history, revolving balance, revolving utilization and home ownership status. The report identifies the natural dependent variable to be predicted, which is the credit score and another variable, the credit approval status, which can also act as the dependent variable in another setting. The report further identifies the type of relationship and the type of model that best suits the data. Additionally, it evaluates the need for interaction terms and determines whether a categorical approach would be useful. Other than that, multicollinearity and residual analysis are carried to determine the reliability of the findings and finally a variable selection procedure is specified in order to produce the best model to analyse the data.
Regression Analysis
The data relates to credit approval decisions, particularly whether or not a client is approved for credit depending on their credit score, years of credit history, revolving balance, revolving utilization and home ownership status. The natural dependent variable to be predicted is the credit score, which is dependent on the years of credit, the revolving balance and the revolving utilization. Apart from the credit score, the approval decision can also be used as a categorical dependant variable. It should however be noted that the ordinary least squares method cannot suffice to produce a good linear unbiased estimator and as such, a linear probability model would have to be adopted. However, in this case, the regression line will not be a good fit for the data, which implies that usual measures such as the coefficient of determination () are more often than not unreliable. Moreover, LPM models are also characterised by heteroskedasticity and most likely produce estimates that are greater than 1, which makes them difficult to interpret because the estimates are probabilities, which should not be greater than one. The error term in such models is also likely to be non-normal, because they follow abnormal distributions. Finally, the relationship between the variables is also likely to be non-linear, which suggests that a different type of regression line would be required to fit the data more accurately, for instance an ‘S’ shaped curve.
The relationship between the credit score and the other variable is linear as illustrated using the excel output below. A first order model is appropriate for model, which implies that the independent variables are only included in the first power.
Table 1: SUMMARY OUTPUT
Regression Statistics | |
Multiple R | 0.815255 |
R Square | 0.664641 |
Adjusted R Square | 0.642769 |
Standard Error | 53.80668 |
Observations | 50 |
Table 2: ANOVA
df | SS | MS | F | Significance F | |
Regression | 3 | 263940.8 | 87980.26 | 30.38875 | 5.53E-11 |
Residual | 46 | 133177.3 | 2895.159 | ||
Total | 49 | 397118.1 |
Table 3: COEFFICIENTS
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 771.8923 | 26.91241 | 28.68165 | 5.37E-31 | 717.7204 | 826.0641 |
Years of Credit History | -2.09848 | 1.490347 | -1.40805 | 0.165839 | -5.09839 | 0.901431 |
Revolving Balance | 0.001565 | 0.000887 | 1.764622 | 0.084266 | -0.00022 | 0.003351 |
Revolving Utilization | -246.228 | 27.93194 | -8.81527 | 1.91E-11 | -302.452 | -190.004 |
From the foregoing, the simple linear regression equation can be expressed as follows
.
It should however be noted that the coefficients for the years of credit history and the revolving balance are not statistically significant. This is because the P-Values for these variables are 0.165839 and 0.084266 respectively, which are both greater than the significance level (0.05). Consequently, these coefficients cannot be used to make inferences about the relationship between the credit score, the years of credit history and the revolving balance. This is reiterated by the line fit plots illustrated in Appendix 1. Notwithstanding, the coefficients for the intercept and the revolving utilization are statistically significant, which implies that it is only safe to conclude that there is an inverse relationship between the credit score and the revolving utilization. Moreover, the significance F is also greater than the significance level, which suggests that the results of the regression are also statistically significant at the 95% confidence level. The coefficient of determination suggests that 66.46% of the changes in the credit score are influenced by the three independent variables while the remaining 33.54% is explained by other factors.
Even though some coefficients cannot be used to forecast the credit score, interaction terms are not needed in this regression because it only involves continuous variables. Multicollinearity is not problematic because the highest correlation between the independent variables is -0.49, which exists between revolving utilization and years of credit history as illustrated in the table below
Table 4: Correlation Analysis
Credit Score | Years of Credit History | Revolving Balance | Revolving Utilization | |
Credit Score | 1 | |||
Years of Credit History | 0.311459966 | 1 | ||
Revolving Balance | 0.007546214 | 0.129899328 | 1 | |
Revolving Utilization | -0.796420254 | -0.487016047 | 0.147135738 | 1 |
The residual plots in appendix 1reveal that all the independent variables are normally distributed. Moreover, the graph below (Squared Residuals against Revolving Utilization) illustrates that there is no heteroskedasticity and finally, the Durbin-Watson statistic is 1.376, which is close to 2 and as such, autocorrelation is not a problem.
The most appropriate model for forecasting the credit score can be derived through a principal components analysis. In this analysis, all the variables with regression coefficients that are not statistically significant are omitted from the regression model. In light of this the most appropriate model will only include the revolving utilization and the intercept. Consequently, the simple regression equation should be obtained from a regression including only these two variables as illustrated below.
Table 5: SUMMARY OUTPUT
Regression Statistics | ||||||
Multiple R | 0.79642 | |||||
R Square | 0.634285 | |||||
Adjusted R Square | 0.626666 | |||||
Standard Error | 55.00605 | |||||
Observations | 50 |
Table 6: ANOVA
df | SS | MS | F | Significance F | |
Regression | 1 | 251886.1 | 251886.1 | 83.24982 | 4.66E-12 |
Residual | 48 | 145232 | 3025.666 | ||
Total | 49 | 397118.1 |
Table 7: COEFFICIENTS
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 757.9223 | 13.94888 | 54.33572 | 9.12E-45 | 729.8762 | 785.9684 |
Revolving Utilization | -220.732 | 24.1921 | -9.12413 | 4.66E-12 | -269.373 | -172.09 |
Using the regression results illustrated above, the simple regression line that would serve as the most reliable predictor for credit score is:
Conclusion
The relationship between the credit score, years of credit and revolving balance is not statistically significant and as such the regression should only be conducted with the revolving balance. From the second regression, it is evident that there is an inverse relationship between credit score and revolving utilization. Specifically, a unit change in revolving utilization results in a decline in the credit score by 220.73.
References
Montgomery, D. C., Peck, E. A. & Vining, G. G., 2011. ntroduction to linear regression analysis. 5 ed. Oxford: Wiley-Blackwell.
Appendix
Appendix 1: Residual Plots
Appendix 2: Line Fit Plots
Thanks for taking a look at our sample papers
Do you need any help with your assignment?
Our aim is to help you get the best grades for your Coursework.
We are very confident in our quality of work that we offer you 100% Money back guarantee
Header Button Label: Get StartedGet Started