

Mohit Arora
Happy Learning !!!
About the Data
​
In this project we will make a Linear Regression model to predict the final marks of a student based of the set of Predictors/independent Variables given in the file like by gpa,quiz1,quiz2,quiz3,quiz4,quiz5,total.Let’s first import the file names grades.csv .
​We can see that the data which we have imported has 105 observations and 22 variables. In these 22 Variables we have continuous variables as well as categorical variable. We need to be be concerned with continuous or scaled data variables as predictors in order to build our mode
Box-plots & Histograms are drawn to check the normality of data for that variable .
​
The graphs and Plots of the below can be found in the data visualization section by clicking here.
1. GPA
Data of Gpa says that mean of GPA is 2.78 & sd of GPA is 0.76 Also Mean and trimmed mean, sd and mad are very near hence no outliers, more data on right side and data is platykurtic (meaning it has lighter tails & flat central peak) as shown by kurtosis value.Also skewness value is -0.05 it means it is a bit left skewed or negatively skewed.
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 105 2.78 0.76 2.72 2.8 0.76 1.14 4 2.86 -0.05 -0.87 ##
se ## X1 0.07
2. Quiz1
Data states that mean is 7.47 sd is 2.48.By the histogram and box-plot we can say that data is skewed towards towards the left that is it is negatively skewed as skewness value is -0.83 also value is 0.04 which is close to 0 hence our data is having heavier tails and sharper peaks an is a leptokurtic distribution .Also range of quiz1 is 0 to 10
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 105 7.47 2.48 8 7.76 2.97 0 10 10 -0.83 0.04
## se ## X1 0.24
​
3. Quiz2
Data states that mean value is 7.98 and sd of 1.62. By histogram states that data is skewed little towards the left and box-plot states that data is a almost normally distributed with an outlier in it which is 3 but it is also little skewed towards left.Range of quiz2 is 3 to 10 .Skewness value of -0.64 means a little left skewed .kurtosis value of -0.35 means a platykurtic with lighter tails and a flat central peak
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 105 7.98 1.62 8 8.12 1.48 3 10 7 -00.64 -0.35
## se
## X1 0.16
​
4. Quiz3
Data states that mean value is 7.98 and sd of 2.31(means the observations can have standard dev. of 2.31).By histogram we concluded that data is more towards the left and box-plot states that data is a not normally distributed but completely skewed towards left.Range of quiz3 is 0 to 10 ,Skewness value of -1.1 means data is left skewed .Also kurtosis value of 0.59 means a leptokurtic with heavier tails and a high sharp peak
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 105 7.98 2.31 9 8.34 1.48 0 10 10 -1.1 0.59 0.23
​
5. Quiz4
Data in quiz4 states that mean value is 7.98 and sd of 2.28(means the observations can have standard dev.of 2.28).By histogram we find
that data is more towards the left and boxplot states that data is a not normally distributed but completely skewed towards left.Range of quiz4 is 0 to 10 .Skewness value of -0.89 means data is left skewed .kurtosis value of -0.09 means a platykurtic with lighter tails and a flat central peak
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 105 7.8 2.28 8 8.11 2.97 2 10 8 -0.89 -0.09
## se
## X1 0.22
​
6. Quiz5
Data states that mean value is 7.87 and sd of 1.77 (means the observations can have standard deviation of 1.77) .By the histogram we conclude that data is spread towards the left side more and box-plot states that data is a little more towards the left side but can be said as normally distributed.Range of quiz5 is 0 to 10 ,.Skewness value of -0.69 means data is a bit left skewed .kurtosis value of 0.16 means a leptokurtic with heavier tails and a sharper central peak than no distribution
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 105 7.87 1.77 8 8.02 1.48 2 10 8 -0.69 0.16
## se
## X1 0.17
​
7. Total
Data states that mean is 100.57 & sd is 15.3.Also Mean and trimmed mean, sd and mad are very near hence no outliers and no variability, most data on right side hence it is left skewed. Skeweness value of -.081 means data is left skewed or can be negatively skewed.Data is leptokurtic meaning heavier tails and sharper central peak as shown in kurtosis value.
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 105 100.57 15.3 103 101.8 13.34 51 124 73 -0.81 0.77
## se ## X1 1.49
​
8. Response Variable Final
Data in the final variable states that it has mean of 61.48.final is numeric with range 40-75 which can be seen from min & max.Standard deviation is 7.94 which means that the 68% of the deviations are within the zone of [ 61.47 + 1*7.94] or [61.48- 1*7.94] i.e b/w53.54 & 69.42 whereas 95% of the final observations are in the range [61.48 +- (2 *7.94)] i.e. b/w 45.6 & 77.36 . Median of final is 62 after arranging 62 is the median final score of students. By Histogram we can say that final data is almost normally distributed around its mean but a little skewed towards left and there is an outlier in the data whih is 40. trimmed mean of final is 61.74 .which is obtained by removing the observations which are quite far from the other observations or in one way quite far from the mean. Skewness value of final data is -0.33 which means that the data little towards the left due to outliers present. kurtosis value of -0.42 means that distribution is with light and thinner tails and its central peak is lower and broader when compared with normal distribution.Hence the data is platykurtic as per the Kurtosis value
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 105 61.48 7.94 62 61.74 8.9 40 75 35 -0.33 -0.42
## se
## X1 0.78