0:18:45 - Statistical Sin: Lying with Pictures (Non-comparable Data)
- Example: Fox News chart comparing welfare recipients vs individuals with full-time jobs.
- No Y-axis label, suggesting the baseline is not zero.
- Definitions are non-comparable.
- "People on welfare" counts all household members if someone gets welfare.
- "People with a full-time job" counts people who have a job.
- This comparison creates a very misleading impression.
0:20:15 - Moral: Are the things you're comparing actually comparable?
- This is a standard statistical sin.
- 0:20:20 - Statistical Sin: GIGO (Garbage In, Garbage Out)
- Meaning: If you've got rubbish data in, you get rubbish results out.
- Charles Babbage anecdote about his computational engine.
0:21:25 - Example: 1840s US Census on Slavery and Insanity
- John Calhoun employed census figures to assert slavery benefited slaves.
- This was challenged by John Quincy Adams.
- Calhoun later acknowledged census mistakes but asserted they would average out.
- Rebuttal: Errors were systematic (biased), not unbiased and independent.
- The data was inherently flawed.
0:23:05 - Morale of GIGO: Analysis of Bad Data
- Analysis of bad data is worse than no analysis at all.
- Individuals tend to do proper statistical analysis on improper data and arrive at incorrect conclusions.
- First question: Is the data worthwhile to analyze?
0:23:55 - Statistical Sin: Survivor Bias
- Photo of a World War II fighter aircraft.
- Examining damage on aircraft that returned to determine where to place armor.
- Flaw: Should have examined the planes that were shot down (the non-survivors).
- Sample (planes that flew back) is not a representation of all planes (including the ones downed).
0:25:10 - Survivor Bias in Sampling
- Problem whenever sampling is used to make inferences about a population.
- Statistical methods are based on random sampling.
- Convenience sampling is typically not random.
- Examples: Course feedback (students who dropped out are not sampled), marks (failing students drop out).
0:25:55 - Statistical Sin: Non-response Bias
- A further category of non-representative sampling that occurs in opinion polls and surveys.
- Respondents to surveys are not representative of the entire population.
0:26:30 - Problem with Non-Random/Non-Independent Samples
- Still able to calculate simple statistics (mean, std dev).
- Cannot make conclusions based on methods such as the Empirical Rule, Central Limit Theorem, or Standard Error because the assumption of random and independent samples is violated.
- Example: Political polls that use landlines are leaving out a big chunk of the population (younger individuals).
0:27:35 - Moral of Sampling Issues
- Always know how data was gathered and what the analysis is assuming.
- Be very cautious with conclusions when assumptions are not met.
0:27:50 - Conclusion and What's Coming Up Next
Will complete statistical sins and deliver a course wrap-up in the next lecture.