Correlation Is Not Causation (And Other Traps)
P-values, confidence intervals, A/B tests — the statistical concepts you need to navigate a data-driven world.
- Correlation vs. causation with real examples
- What p-values and confidence intervals actually mean
- Why most published research findings are false
- How tech companies use A/B testing at scale
Correlation and causation are not the same thing
Correlation Is Not Causation (And Other Traps)
P-values, confidence intervals, A/B tests — the statistical concepts you need to navigate a data-driven world.
Correlation vs. causation
Correlation means two variables move together. Causation means one variable changes the other.
A correlation coefficient can be positive, negative, or near zero. But even a strong correlation does not prove a cause.
Common traps
- Hidden confounder: a third variable affects both outcomes
- Reverse causation: the effect is actually driving the cause
- Coincidence: large datasets produce spurious matches
Real example
Ice cream sales and drowning deaths both rise in summer. Heat increases both beach trips and ice cream purchases. The shared cause is temperature, not ice cream.
P-values and confidence intervals are about evidence, not certainty
What a p-value means
The p-value is the probability of data at least as extreme as what you observed, assuming the null hypothesis is true.
If p = 0.03, that does not mean:
- there is a 3% chance the null is true
- there is a 97% chance the result is true
It means the observed result would be uncommon in a no-effect world.
What a confidence interval means
A 95% confidence interval is a range produced by a method that captures the true parameter in 95% of repeated samples.
A narrow interval usually means more precision. A wide interval means more uncertainty.
Why sample size changes everything
A tiny effect can become statistically significant in a huge sample. That is why you should always ask about effect size, uncertainty, and practical meaning.
Why many published findings do not survive contact with reality
Why false positives happen
If you test 20 independent null hypotheses at a 5% threshold, you expect about 1 false positive on average.
Problems get worse when researchers:
- test many outcomes
- try many subgroups
- stop collecting data when the result looks good
- publish only successful findings
Key paper
John Ioannidis published "Why Most Published Research Findings Are False" in 2005 in PLOS Medicine.
A/B testing turns guesswork into measurement
What A/B testing does
A/B testing compares two versions under random assignment.
It helps answer a causal question: did version B cause a change in behavior?
Why randomization matters
Random assignment makes the groups similar on average, so differences are more likely to come from the treatment itself rather than hidden confounders.
Example
If a product team tests a new checkout button on 1,000,000 visitors and sees conversion rise from 4.8% to 5.0%, the absolute lift is 0.2 percentage points. That sounds tiny, but at scale it can mean thousands of extra purchases.
A practical checklist for reading data claims
Read claims in this order
- What question was actually tested?
- Was the design observational or experimental?
- What is the effect size?
- What is the confidence interval?
- Was there replication?
- Were many analyses tried?
Fast rule of thumb
Strong claims need strong designs. A flashy p-value is not enough.

The bottom line
Correlation is a clue, not a conclusion.
P-values measure surprise under a null model.
Confidence intervals show plausible effect sizes.
A/B tests, when randomized and well run, can support causal claims.
The best analysts do not worship a single number. They ask what the data can and cannot prove.
Keep going with Slate
Pick up where this left off in your own voice session.