Data Analytics for Management (Fach) / Multiple Regression (Lektion)

In dieser Lektion befinden sich 23 Karteikarten

<3

Diese Lektion wurde von Janaw55 erstellt.

Lektion lernen

Diese Lektion ist leider nicht zum lernen freigegeben.

  • R Squared Interpretation vs. Adjusted R Square R Square % of variation of Y that can be explained by the regression line." "A single explanatory variable X is able to explain only A% of the variation in the Variable Y. In turn, there is still B% of the variation unexplained. - Should be as close to 1 as possible: increase by adding better and or more explanatory variables. Before a single variable X can explain a large percentage of the variation in some other variable Y, the two variables must be highly correlated in either a positive or negative direction. - Good for comparison as it provides evidence that X is slightly better predictor of Y - R Square is the square of correlation between the observed Y values and the fitted Y values.  If correlation between X and Y is 0.8, R Square will be 0.64. If the correlation drops to 0.7, the percentage drops to 49% Adjusted R Square The adjusted R Square tells us whether the effect of added explanatory variables is significant or not. If it is significantly los from R Square, then the effect of added explanatory ariables is not signigicant and can be omitted. While R2 assumes that every single variable explains the variation in the dependent variable. the adjusted R2 tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable. The adjusted R2 will penalize you for adding independent variables (K in the equation) that do not fit the model. If you add more and more useless variables to a model, adjusted r-squared will decrease. If you add more useful variables, adjusted r-squared will increase. Adjusted R2 will always be less than or equal to R2. You only need R2 when working with samples.
  • 5 Basic Assumptions of Regression Model (1) Linearity Assumption: There is a population regression line. It joins the means of the Xs for all values of Ys. All values of Xs are assumed fixed. The only randomness in the values of Y comes from the error term i. The mean of the errors is 0, for any fixed values of Xs. (2) Uniform Variance (Homoscedasticity) Assumption: For any values of Xs, the variance of Y is constant.  (3) Normality Assumption: The errors i are normally distributed with 0 mean, i.e., ~ N(0, ó2). (4) No Multicollinearity Assumption: The errors are uncorrelated in successive observations.
  • (1) Linearity Assumption A Population regression line exists. It joins the means of the Xs for all values of the Ys.  For any fixed values of the Xs, the mean of the errors is 0. We estimate coefficients of population regression line from sample data, using the least squares method. There is and exact linear relationship in the population between the means of the Xs and the values of the Ys. This hold true for dummies, interactions, nonlinear transformations. CHECK BY: Scatterplots of Y against each of predictors is reasonably straight (no strong bend) Check the Residuals by plotting residuals against Ys and check for patterns.   
  • Error vs. Residual Error: Vertical distance from a point to the true, population regression line error for any point labeled ε, is the difference between Y and μY |X1,…, Xk that is Y = μY |X1,…, Xk + ε, cannot be calculated  Residual: Vertical distance from a point to the estimated regression line. Residuals can be calculated from observed data
  • (2) Uniform Variance (Homoscedasticity) For any values of Xs, the variance of Y is constant and the same for all values. Homoscedasticity: Residuals appear completely random. No indication of model inadequacies However, almost always violated to some exptent. Mild violations do not have much effect on the validity of the regression output. Violation: Non-constant Error Variance - Heteroscedasticity:  increases in a variable result in increases in variability. i.e. Fan-shape phenomenon Import, because: Can cause incorrect value for the standard error of estimate, so that confidence intervals and hypothesis tests for the regression coefficients are not valid. Solve by: Logarithmic transformation of the dependent variable (Y). Use a different estimation method than least squares, called weighted least squares. Detect by: Scatterplot of the dependent variable vs. Y Scatterplot of standardized residual variables vs. Y.
  • (4) No Multicollineratity, No Autocorrelation The errors are uncorrelated in successive observations. Violation: (1) Autocorrelation of residuals - i.e. Time-series data, or Cross-sectional data observations are ordered in some particular way. Residuals are often correlated with nearby residuals, a property called autocorrelation of residuals (most frequent type of autocorrelation: positive autocorrelation) Residuals separated by one time period are correlated, it is called lag 1autocorrelation. Solve by: The Durbin-Watson (DW) statistic is a numerical measure used tocheck for lag 1 autocorrelation (a DW statistic below 2 signals that nearby residuals are positively correlated with one another. When the number of observations is about 30 and the number of explanatory variables is fairly small, then any DW statistic less than 1.2 warrants attention.) Exact Multicollinearity - strong correlations between two or more predictors in a regression model Important, because: No X should be an exact linear combination of another X Redundancy in the data. One of the Xs can be eliminated without and loss of information and is not needed.  Information on some errors gives no information on the values of other errors. Detect by: CORRELATION MATRIX Large p-values, small t-values (candidate for exclusion), wrong values for the coefficients    
  • The principle of parsimony Explain the most with the least. It favors a model with fewer explanatory variables, assuming that this model explains the dependent variable almost as well as a model with additional explanatory variables. Goal: Determine equation with the best set of explanatory variables. The estimates of α and the βs are the least squares estimates of the intercept and slope terms.    
  • Sampling Distribution of a Regression Coefficient Can be applied to the least squares estimate of a regression coefficient. For the sampling distribution of b, the least squares estimate of β is the distribution of bs you would see if you observed many samples and ran a least squares regression on each of them.  We state the main result as follows. Let β be any of the βs, and let b be the least squares estimate of β.  If the regression assumptions are valid, the standardized value t = b − β / sb has a t distribution with n − k − 1 dof Implications The Estimate b is unbiased in the sense that its mean is β, the true but unknown value of the slope. If bs were estimated from repeated samples, some would underestimate β and others would overestimate β, but on average they would be on target. The Estimated standard deviation of b is labeled sb. It is usually called the standard error of a regression coefficient, or more simply, the standard error of b. This standard error is related to the standard error of estimate se, but it is not the same.
  • (3) Normality Assumption For any fixed value of X, the residuals are normally distributed with zero mean, i.e. ε ∼ N(0,σ2) - Unless the distribution of the residuals is severely nonnormal, the inferences made from the regression output are still approximately valid.- If Normality assumptions is met, ratios from Hypothesis test follow a Student’s t-distribution. Important, because: We can use a Student’s t-model for inference (ANOVA, Coefficient test) Normality Assumption becomes less important as the sample size grows. (CLT). Check by: HISTOGRAM of Standardized Residuals Normal Probability plot (a Q-Q plot produced by Excel, P-P Plot in SPSS)  Example: One form of nonnormality often encountered is skewness to the right.
  • Treatment of Outliers A large outlier can strongly influence the results and should be ruled out.  Depending upon where the outlier falls, the correlation coefficient may be increased or decreased. The smaller the sample size, the greater the effect of the outlier.  At some point, the outlier will have little or no effect on the size of the correlation coefficient. The decision whether to include or exclude an outlier remains with the researcher. However, he or she musst justify deleting data to the reader of a technical report.  Alternatively: compute correlation coefficient with and without outlier  Spot by: Scatterplot Decision should be made with regart to  (1) Why is it an outlier? Was the respondent deliberately giving a wrong anser did not understand the question typing errors (2) The outlier is real and simply sifferent. 
  • What to do with Counter Intuitive Results Indication that something has been overlooked.  i.e. Sales appear to decline with increasing advertisting. However, there are two clusters of points. The two clusters might represent "new products" and "established products" and it might be dangerous to treat them as they were the same  Solve by: Include a Dummy variable to allow for a shift. Model the 2 Clusters separately.
  • Introducing Dummies, Binary variables, Categorical Variables, Some explanatory variables are categorical and cannot be measured on a quantitative scale We must follow two rules:1. Don't use any if the original categories that the dummies are based on2. Always use one fewer dummy than the number of categories for any categorical variable The omitted dummy then corresponds to the reference category. If we include all five categories, any statistical package will give us a "perfect multicollinearity" (error message.) It doesn't matter which dummy is omitted. 
  • Interaction Variables Needed when there is reason to believe that effect of one X depends on value of another X  Allows to be more realistic by allowing regression lines to have different slopes. No Interaction: When you include only a dummy variable in a regression equation you are allowing lines to differ, but you are forcing the lines to be parallel.
  • CI for a Regression Coefficient b = t-multiple *sb t-multiple for dof n-k-1
  • T-Value b= least squares estimate of beta, unbiased, normally distributed.  beta = and of coefficients b - beta / sb = t distribtuion t-value = b / sb Test reflects if p-value is small  Two tailed test
  • Reason for No Dummy No Dummy variabelassumed that the change between steps (i.e. for bathrooms) is the same per each additional bathroom.  If neighboorhood 1,2,3 have similar steps between them, then no dummy is needed
  • Answer: Small p-value The variable is redundant. This means that the explanatory variable can be removed from the analysis, because it contains no additional information that is not already contained in the variables. 
  • Principles for the Variance of errors (1) Relevance (2) Data availability. Best Theory: Trial and Error approach for a useful set of explanatory variables. No single ture regression
  • Goodness of Fit Evaluation (1) Observe R Square from the summary output.  Interpretation: This implies that about 89% of the variations in Y of the house are explained by X,... and an indication whether the house has (Dummy X).  (2) ANOVA Test: To test the good fit of the regression line, the hypotheses are set as follows:  H0: Fitted line is not a good fit H1: Fitted line is a good fit.  Interpretation: "the obtained test statistic (F-Ration) is 20.84 and the corresponding p-value is almost equal to 0, meaning that the result is significant at the 5% level. Hence,we have enough evidence to reject the null and conclude that the alternative hypothesis is true at the 95% confidence level.
  • Interpretation: Dummy Variable Small p-Value The coefficient for the explanatory vairbale X implies that if all other explanatory variables remains constant then predicted Y of male would be less than predicted Y of female.  Large p-Value One extra level of X does not improve your Y (Y). It does not matter whether you have X1 or X2. If your would take out X2 (not significant), R Square would go down, but the Adjusted R Square would go up. 
  • Reasons for Multiple Regression Build more powerful models by taking other relevant factors into account Better forecasts -> tighter Confidence Intervals (smaller errors) See which variables have a significant impact on our “target” Variable (hypothesis testing)
  • Interpreting Coefficients - The Not-To-Do List (1) Hold everything else constant for an individual We cannot claim to be able to “hold everything else constant” for a single individual. While it’s mathematically correct, it often just doesn’t make any sense, i.e. we can’t gain a year of experience or have another child without getting a year older. (2) Infer regression Causally Regressions are usually applied to observational data. Without deliberately assigned treatments, randomization, and control, we can’t draw conclusions about causes and effects. We can never be certain that there are no variables lurking in the background, causing everything we’ve seen. We have no way of knowing what applying a change to an individual would do.  (3) Do not interpret a regression model as predictive The term “prediction” suggests extrapolation into the future or beyond the data, and we know that we can get into trouble when we use models to estimate values for x’s not in the range of the data. Be careful not to extrapolate very far from the span of your data.  (4) Don’t think the Sign of a Coefficient is Special.  Sign of the coefficient also depends on the other predictors in the model. Don’t look at the sign in isolation and conclude that “the direction of the relationship is positive (or negative).” Just like the value of the coefficient, the sign is about the relationship after y ˆ y ˆ b1 x1 x1b 1 (5) Interpreting an insignificant coefficient If a coefficient’s t-statistic is not significant, don’t interpret it at all. You can’t be sure that the value of the corresponding parameter in the underlying regression model isn’t really zero. Assumptions Don’t fit a linear regression to data that aren’t straight.This is the most fundamental regression assumption. If the relationship between the x’s and y isn’t approximately linear, there’s no sense in fitting a linear model to it. What we mean by “linear” is a model of the form we have been writing for the regression. When we have two predictors, this is the equation of a plane, which is linear in the sense of being flat in all directions. With more predictors, the geometry is harder to visualize, but the simple structure of the model is consistent; the predicted values change consistently with equal size changes in any predictor. Usually we’re satisfied when plots of y against each of the x’s are straight enough. We’ll also check a scatterplot of the residuals against the predicted values for signs of nonlinearity. ● Watch out for the plot thickening.The estimate of the error standard deviation shows up in all the inference formulas. If changes with x, these estimates won’t make sense. The most common check is a plot of the residuals against the predicted values. If plots of residuals against several of the predictors all show a thickening, and especially if they also show a bend, then consider re-expressing y. If the scatterplot against only one predictor shows thickening, consider re-expressing that predictor. ● Make sure the errors are nearly Normal. All of our inferences require that the true errors be modeled well by a Normal model. Check the histogram and Normal probability plot of the residuals to see whether this assumption looks reasonable. ● Watch out for high-influence points and outliers. We always have to be on the lookout for a few points that have undue influence on our model, and regression is certainly no exception. Partial regression plots are a good place to look for influential points and to understand how they affect each of the coefficients.
  • Scatterplot Matrix Displays scatterplots for all pairs of a collection of variables all the plots in a row have the same variable displayed on their y-axis all plots in a column have the same variable on their x-axis. Usually, the diagonal holds a display of a single variable such as a histogram or Normal probability plot, and identifies the variable in its row and column.