

Mohit Arora
Happy Learning !!!
Finding the best Prediction Model Using Linear Regression
There can be various models built based on the predictors available .Let’s start building Linear Regression Models

From the above model we can see that the quiz1,quiz2,quiz3,quiz4,quiz5 have p values which are greater than LOS hence our null hypothesis H0: slope is not significant OR There is no significant Linear relationship with final of these predictors for this Linear Regression model. With this model we can say that the preditors together explain 49.22% variance in the dependent variable final & the value Adjusted R squared of .4611 meaning 46.11% variance an be explained by this model for final. We will not consider this model as it has more predictors and even slope of all the predictors except gpa is not much significant with final Point to be noted is that Quiz3 has nearest signifiant p value close to LOS of 0.05 so we an have quiz3 as a consideration

In this model after removing quiz1 from the above model there is not much effect on the value of R-squared hence we can say that using this model 49.06% of variance in final is explained . But here again p-value of quiz2,quiz4,quiz5 are more than LOS (alpha=0.05) meaning their slope is not significant to final variable.Hence we are removing these predictors as we need minimum no of predictors without much compromising on R-Squared value


In this model the p-value of both gpa & quiz3 is less than LOS(alpha=0.05) meaning slope of both are significant with the dependent variable.Moreover, as we need to take least variables in our model gpa & quiz3 are giving us 45.35% explanation in variance of final, so this model is one of the best models to be considered as the predictor model for the prediction of response variable final .
​
Both the preditors here have the same Variance inflation fator but they are not greatly correlated as VIF around is 1.06.
​
As gpa is also obtained by taking into consideration all the marks in quiz 1-5 and final and total.So taking GPA as a predictor needs the principal to have GPA scores readily available with her for predicting final scores(used to calculate gpa). (Principal can predict through the model of GPA for future students that a student with assumed GPA will have predicted final score within this particular range) The Linear Regression model that can be considered best for predicting final will be final~gpa+quiz3 . This model will give a prediction of final based (on actual gpa & quiz3- for students with data available ) or (on assumed gpa and quiz 3 score -for future students) .This model helps us explain the variance in final with 45.35 %. This model an be used to predict Final using GPA & quiz 3
​

The Best Linear Regression model with condition can have total as a predictor of final scores for the students(who have got scores in all the quiz1-5, final and have gpa available-hence total available with them) or( future students- who don't have any score available with them and an assumption based on estimated total and quiz3 score will predict final score for the students) . Principal can predict using this model that a student with total & quiz3 must have a final score in a particular range. The Linear Regression of total combined with quiz3 will give us explanation of 87.31% of variance in final (y) . For future students who don't have total score available with them cannot use this model to predict final score obviously as they don’t have total sores available with them. But Principal an take an idea that for future students for this much total & quiz3 score student final score can be predicted and will lie in a particular range as predicted by the model with 87.31% variance in final explained .Moreover for this model vif is not artificially being increased by any of these two predictors.


I'm a paragraph. Click here to add your own text and edit me. It's easy.
In this model around 31.49% of variance in final is explained by quiz3 alone. Hence We can consider it as an average to good predictor model.

Out of R-Squared value & Adjusted R-Squared value , Adjusted R squared value can be considered more superior, As it takes into consideration the predictors of the model while calculating the variance for the dependent variable .R-Squared value suppose that every independent variable is responsible for some variation in the dependent variable whereas Adjusted R - Squared Value gives the percentage for those independent variables which in actual effects the Dependent variable. R-squared measures the proportion of the variation in the dependent variable (Y) explained by the independent variables (X) for a linear regression model whereas, Adjusted R-squared adjusts the statistic based on the number of independent variables in the model. Although there is very less difference in the value of both, so we can consider any one for our consideration.