Data Analytics for Management (Fach) / Hypothesis Testing (Lektion)
In dieser Lektion befinden sich 14 Karteikarten
<3
Diese Lektion wurde von Janaw55 erstellt.
Diese Lektion ist leider nicht zum lernen freigegeben.
- Why do we standardize? To measure variables with different means on a single scale Standardized scores standardize two things: center and spread. The reason that the standard deviation is changed to 1 specifically is that now our z-score is a measure of how many standard deviations we are from the mean, regardless of the size of that deviation. Now that we have adjusted for the spread of our distribution, we can use a single model to look at the likelihood of any observation (as long as the data is approximately normal). NORMSDIST(x) NORMSINV(p) + Mathematically convenient representation of the variable. + Dimensionless + Useful to evaluate different types of random variable with one yardstick. + Allows us to use one distribution to compare apples to oranges
- Trade off between CIs As Condifence Level increases, the CI also increases;The higher the percentage of confidence desired, the wider the confidence interval. Less confidence & narrow interval More confident & wieder internal Solve by: larger sample: the larger the n, the smaller the standard error, and so the narrower the confidence interval. CI half as wide as the current one, increase sample size n by 4
- Estimate required Sample Size answer If you wanted to be 95% condifent that the error in the estimate of p would be no more than 5%,then the std sample size formula required n to be 300
- Interpretation Confidence Interval If we had enough resources to select 100 random samples of size 200 from the popluation of X, and then computed the 100 sample means with CIs, about 95 out of 100 CI would contain the unknown true population mean. Hence, we can be 95% certain of capturing the true population mean
- Large P-Value Rationale to not reject the Null Hypothesis Our t-value is smaller than t-critical and our the p-value is greater than the significane level %, meaning that our result is not statistically signigicant. We fail to reject the null hypothesis (p-value > alpha) and conclude that not enough evidence is available to suggest the null is false at the 95% confidence level. The p-value measures how unlikely the observed sample results are when the null hypothesis is true
- Small P-Value Rationale to reject Null Hypothesis Our t-value is very far from the mean and is most likely larger than t-critical. As our t-value is large, our p-value is very small, which means that the result is statistically significant Hence,we have enough evidence to reject the null (p-value <= alpha) and conclude that the alternative hypothesis is true at the 95% confidence level
- Target Value not included in CI The CI extends from X to Y. However, it does not include the target value Z. Thus, the Null hypothesis is rejected at the 5% significance level. Concluseviy, we can say that the CI estimate suggest that there has been a significant change in proportion of students who HA.
- Overlapping CIs Non-Overlap in CIs: the CIs do not overlap, this indeed inicates statistically significant differences between the mean at the 5% level of significance Overlapping CI: We cannot conclusively say that this machine is faster than one on average because the CIs overlap. If the CIs would not overlap, there would indeed be statistically significant differences between the mean at the 5% level of significance. As the CIs overlap, yet there may be statistically significant differences between the means, however, this is not necessaily true.
- Two Sample T-Test Question: How many SE is sample mean difference is from the hypothesized mean difference 0 Finding: We see that the sample mean difference is 2.48 standard errors to the left of 0. T-Critical Meaning: In order to reject the Null, the sample mean difference needs to be more than t-critical SE from the mean The more standard errors the "t Stat" is away from 0 (the hypothesized value) and the smaller the p value is, the more convinced we are about rejecting the null hypothesis T-stat = point estimate / standard error Point Estimate = x1-x2 SE for difference between two means, assuming unequal variances= SQRT ((s^2 / n1 + s^2 / n2 )) dof = n1 =n2 -2
- One-Sample Test Tests whether the mean of a normally dstributed population is different from a specified value for small samples <30, if our std deviation is not known Here the standard deviation of the population, s, is unknown and we estimate it with the sample standard deviation, s For a large sample (n > 30) we can use the Normal distribution. For a small sample We assume a normally distributed population and knowing that our samples are small we should use the t-distribution We use the t-distribution, if the data is normally distributed, or non-parametric distribution More accurate than normal distribution, because it uses a widwer interval to reflect the sample and takes into account the size of the sample Degree of freedom should be >30If sample size is greater, t-critical moves closer to the z-statistics
- Directional & Non-Directional Tests Non-directional hypothesis: Two tailed test Assesses whether effect occurs or not, but not direction of effect we divide the 5% error we allow ourselves between the two tails i.e. lie or nor, difference in shooting baskets yes or no Directional hypothesis: One-tailed test Assess which group will score higher, i.e. do women spend MORE on their haircut than men Place in one tail the full 5% error we allow ourselves Need a less extreme value to reject null (more than 1.64) but it must be the right direction
- Types of Errors - Decisions May Be Incorrect In Two Ways.. TYPE I Error (false alarm, false positive): State: H0 ist true Reject a true H0 Accused by crime but not commited it TYPE ll Error : Wrong test results, hide pregnancy Fail to reject a false H0 Null hypothesis is false. If the null hypothesis is false, then the probability of a Type II error is called β (beta). The probability of correctly rejecting a false null hypothesis equals 1- β and is called power.
- Rules of Testing σ= - n>30 => z-score: σ / √n - n<30 => z-score: σ / √n σ unknown= - n>30 => z-score: S / √n - n<30 => t-critical: S / √n proportions: - p*n>5 => z-score: √ p*(1-p)/ n, where p is hypothesized p - p*n<5 => Do not approximate with Normal-Distribtuion Binominal: - p*n>5 => z-score: √ n*p*(1-p) - p*n<5 => Do not approximate with Normal-Distribtuion
- Sample Variance s = sample variance = 1/n-1 ∑(xi-x)2 For proportions: Always use hypothesized p!
