Effect Size (Cohen’s d) Explained Simply

A plain‑English guide to effect size and Cohen’s d: what it is, why it matters, when to use it, and how to interpret it—with a quick example and links to calculators.

4 min read
statistics
a-b-testing
product

Why this exists

“Significant” doesn’t tell you if a result is big enough to matter. With enough data, even a tiny, useless change can be statistically significant. Effect size fixes that by answering a different question: How large is the difference?

What is effect size?

It’s a single number that describes the magnitude of a difference or relationship. For comparing two averages (like time on page, revenue per user, test scores), the most common choice is Cohen’s d.

  • Cohen’s d: difference between two means measured in units of a standard deviation
  • Think of it as: how many “spread units” apart the groups are
  • Useful for comparing across studies, metrics, and sample sizes

Cohen’s d, in one sentence

It’s the difference in means divided by a typical amount of spread. If the difference is half a standard deviation, then d = 0.5.

A 60‑second example

You test a new onboarding screen.

  • Control average time to complete: 120 seconds
  • Variant average time: 132 seconds
  • Typical spread (standard deviation): about 60 seconds

Cohen’s d is:

d = (132  120) / 60 = 0.20

That’s a small effect. Whether to ship depends on context: does a small time increase help activation? Is there a trade‑off with completion rate? Effect size helps you weigh that.

How to compute it (without getting lost in math)

For two independent groups, Cohen’s d uses a pooled standard deviation (a blended estimate of spread):

s_p = sqrt( ((n1−1)·s1^2 + (n2−1)·s2^2) / (n1 + n2  2) )

d = (mean2  mean1) / s_p

Practical guidance:

  • If the group spreads (s1, s2) look similar, using their pooled (or even average) SD is fine for intuition.
  • If spreads are very different, consider Glass’s Δ: divide by the control group’s SD.
  • For before/after with the same people, use a paired version (standardize by the SD of the differences).

What the numbers mean (use context!)

Common conventions for absolute values of d:

  • 0.2 ≈ small (noticeable with lots of users)
  • 0.5 ≈ medium (clear, meaningful)
  • 0.8 ≈ large (substantial, obvious)

These are rules of thumb. A “small” effect can be huge at scale (conversion rate), while a “large” effect on a noisy vanity metric might be irrelevant. Always describe the practical impact in your own units (e.g., seconds saved, dollars per user, % activated).

What to report

  • The effect size (d) and which group is “better” (sign matters)
  • A 95% confidence interval around d (how precise you are)
  • The practical translation in your units (“about 12 seconds slower per user”)
  • Any assumptions (independent groups, similar spreads; or say if you used Glass’s Δ or a paired approach)

When not to use Cohen’s d

  • Binary outcomes (converted vs. not): use risk difference, risk ratio, or odds ratio instead
  • Very different spreads between groups: consider Glass’s Δ or Welch‑based approaches
  • Same people measured twice: use a paired effect size

You can try interactive tools here:

For the curious (optional details)

  • Small samples tend to make d a bit optimistic. A common correction is Hedges’ g, which slightly shrinks d toward zero. If you’re publishing or samples are small, report g as well.
  • If you already have a t‑test: d t × sqrt(1/n1 + 1/n2) for independent groups (a quick back‑of‑the‑envelope link between significance and size).

TL;DR

  • Effect size answers “how large is the difference?”—the decision‑making piece p‑values can’t give you
  • Cohen’s d standardizes a mean difference using a typical spread (standard deviation)
  • Use context and confidence intervals; choose alternatives for binary, paired, or highly unequal‑spread cases