Data Analytics for Management (Subject) / Sampling Distribution (Lesson)

There are 13 cards in this lesson

<3

This lesson was created by Janaw55.

Learn lesson

This lesson is not released for learning.

  • Confidence Interval for Average Mean Salaries? Do we have to make any assumptions?  For the average income for an AVERAGE manager, we can take shelter behind the CLT and use the fact that the distribution of the sample mean approximately follows a Normal Distribution with mean x and std s/SQT(n) , if n is sufficiently large. Random Manager: CI = Point Estimate + 1.96 x STANDA|RD DEVIATION 
  • Confidence Interval for a Single Random Manager's Income. Do we have to make any assumptions? For the average income for a single random manager, we have to assume that his income is symmetric like the normal distritbution. This is what enables us to use the Z-multiple of the Normal Distribution 1.96 SD ti get a 95% CI for his mean. However, this does not exaclty hold true as his income could have a different shape (i.e. right skewed) Random Manager: CI = Point Estimate + 1.96 x STANDARD DEVIATION 
  • Binomal -> Approximate with Normal  Continuity Correction Requirements: (1) Discrete Distribution (2) Measurements with only two outcomes: Success with probability p; & "failure" with probability (1-p) (IF ) n*p >5 and (1-p)*n > 5 -> Approximation with Normal Distribution If n is large and p is big, the Distribution is bell-shaped.  Needs Continuity Correction -> expanded INTERVALS for the Binominal (Normal is always -0.5 on the edge) (IF) n is small -> Hyperbolic Distrbution EXCEL FORUMULA  BINOMDIST(k,n,p,cum) where Alternatively, use NORMDIST n is the number of trials p is the probability of ‘success’ k is the integer number of successes that we specify cum = less than or equal to  0 = exactly k successes MEANING Probability of more than 200 is 1 - Prob of less than or equal 200. Probability of at least 200 is 1 - prob of less than or equal 199 mean = np SD for Binominal SQRT(p*(1-p)*n) CI =  Upper 95 = p + Z * SE Lower 95 = p - Z * SE
  • Other Random Samples A simple random sample of size n is one where each possible sample of size n has the same chance of being chosen. Frequent Use More efficient Use of given sampling size Not every sample has same chance of being chosen More or less representative (good for : Ordered List of sales, bad for weekdays) 
  • Sampling Methods: Simple Random Sample A simple random sample of size n is one where each possible sample of size n has the same chance of being chosen. Straightforward Easy with RAND() - Not often used in practice - Requires list of population (prone to bias) prior to smaple - Costly - Possible that samples are spread out  - Sometimes over-/ unterrepresentation of group One Way: List N households in alphabetical order Allocate each a number between 1 and N  Use excel function RANDBETWEEN( 1 , N ) generate 150 numbers Other Way: List N households in order of random numbers Use the =RAND() function to select a number next to each household Order the households using the random numbers  Sort & Pick the first 150 These 150 households will then be contacted and data collected; Statistics calculated
  • Probability Sample vs. Judgemental Sample Probability Sample: Members of a probability sample are chosen according to a random mechanism Use Probability to make inferences Less prone to biases Judgemental Sample Members of a judgemental sample are chosen according to the sampler's judgement Rules of probability do not apply Prone to Biases
  • Sampling Methods, Non-Simple: Cluster & Stratified Sampling Stratified Sampling Population is divided into relatively homogeneous subsets called strata, with high variation between Strata, small variation within Strata Random samples are taken from each stratum using proportional Sample Sizes  Cluster Sampling Population is seperated into clusters, such as cities or city blocks, and then a random sample of each cluster is selected. + Sample Convenience + Lower Cost (in one area) - Can be less accurate  - Narrow View 
  • Experimental Research Methods Confounding Variables - Tertium Quid Variable that we may or may not have measured other than the predictor variables potentially affects and outcome variable.  
  • Randomization in Experiments Randomized Experiment: In a randomized experiment, suject are randomly assignet a treatment. Study sample is divided into one group that will receive intervention (treatment group) and one group that will not receive the intervention (control group) Differences in response should reflect effectiveness of intervention Purpose of Randomization: Automatically control any confounding variables  Establish a cause and effect relationship. Evidence is more supported
  • Experimental Research One or more variables is systematically manipulated to see their effect (alone or in combination) on and outcome variable. Effect: An effect should be present when the cause is present and that when the cause is absent the effect should be absent, too Control conditions: the cause is absent Treament conditions: the proposed cause is present  Subjects are randomly assigned a treatmeny and the effect of the treatment is examined. Under random assignment the groups should not differ significantly with respect to potential cofounding variables. Using randomization in an experiment can establish a cause and effect relationship.  Cause and Effect:  Cause and Effect must occur together and in time (contiguity) Cause must occur before an effect does Effect should never occur without the presence of the cause Statements can be made about cause and effect Research Designs are repeatable --> results can be checked and verified In laboratory research, conditions not found in a natural setting can be created in an experimental setting that allows for greater control of extraneous variables. Variations of experimental research and the researcher can tailor the experiment while still maintaining the validity of the design - Error also plays a key role in validity of project as discussed in previous modules. - Experimental research is a powerful tool for determining or verifying causation, but it typically cannot specify “why” the outcome occurred. - Research must adhere to ethical standards in order to be valid
  • Sampling Error and Biases Non Response BiasNon Truthful Resposne Bias -> Randomize ExponseMeasurement Error ->  WordingVoluntary Response
  • Concers about  simple random sampling method Concers about  simple random sampling method sample is possibly small;there might be no answer to the calls or people refusing to answer;some answers may be untruthful;by chance, sample may not be representative of the population by happening to contact a majority of very rich or very poor people.Main concerns about errors:no identification that sample is very small comparing to population;no identification of possible non-respondents;no identification of risk of untruthful answers.
  • Random Variables & Probability Distribution Statistics is the science of “random variables” - events or measurements which contain randomness and may take a range of values • “Random variables” are not completely “random”• Typically the possible values might centre on an “average” value• there will be a certain “range” or spread around this average• some values are more likely than others We use “probability distributions” to describe the set of possiblevalues and the probability of each occurring • In practice we may have a limited “sample” of observations• We use these to estimate “statistics” which measure properties ofthe underlying “true” distribution • the “true” distribution is also known as the “population” distributionbecause it is what we would get if our sample was the entire population