Stratified Sampling: When Subgroups Matter

Stratified sampling is the probability sampling method that divides the population into subgroups (strata) and samples randomly within each. It produces more precise estimates than simple random sampling when the strata are internally homogeneous and meaningfully different from each other. It also guarantees representation of small but important subgroups that random selection might otherwise miss. Stratified sampling is the right choice whenever the research question involves comparisons across subgroups, or whenever a population contains identifiable groups that vary on the outcome of interest.

This article defines stratified sampling and explains when it improves on simple random sampling. It walks through the two main allocations (proportional and disproportionate), covers the implementation steps, and explains the sampling weights the design produces. For the broader sampling context, see our probability sampling overview. For the conceptual baseline, see our guide on simple random sampling. For the methodology pillar, see our research methodology guide.

Quick Answer: What Is Stratified Sampling?

Stratified sampling divides the target population into non-overlapping subgroups called strata, then draws a random sample within each stratum. The method has two main allocation rules. Proportional allocation samples each stratum at the same rate, so the sample composition reflects the population composition. Disproportionate allocation samples some strata at higher rates than others, usually to ensure that small but important subgroups are large enough for analysis. Stratified sampling produces more precise estimates than simple random sampling when the strata are internally homogeneous and externally different. The cost is added complexity: sampling weights are required in the analysis whenever the strata are sampled at unequal rates.

What Stratified Sampling Is

Stratified sampling has three required components. First, the population is divided into strata: subgroups that are mutually exclusive and collectively exhaustive (every member belongs to exactly one stratum, and every member belongs to some stratum). Second, the researcher draws a probability sample within each stratum independently, typically using simple random sampling within strata. Third, each population member's selection probability is documented; in stratified sampling, the probability is the within-stratum sampling rate, which may differ across strata.

The strata are usually defined by a variable known for every population member in advance: sex, age group, region, school district, ethnicity, income bracket, occupation, or some combination of these. The variable used to form strata is called the stratification variable. The choice of stratification variable is the most consequential decision in a stratified design, because the precision gains depend on how strongly the variable predicts the outcome.

Why Stratify? When Subgroups Matter

Stratified sampling addresses two distinct research needs. The first is precision. When a population contains identifiable groups that vary on the outcome of interest, stratifying on those groups produces tighter confidence intervals than simple random sampling would. The precision gain holds at the same total sample size. The second is subgroup coverage. When small but important subgroups exist in the population, random selection from the population as a whole can produce a sample with too few members of those subgroups. The result is a sample that can't support defensible subgroup-level analysis.

A concrete example clarifies both points. Imagine a study of physician burnout in a 10,000-person hospital system. The population is 70% women and 30% men, and burnout is on average meaningfully higher for women in this setting. A simple random sample of 400 produces approximately 280 women and 120 men by chance alone, and the standard errors of any sex-stratified estimate depend on those subgroup sample sizes. By contrast, a stratified sample with proportional allocation guarantees exactly 280 women and 120 men, eliminating the chance variation in subgroup sizes. A disproportionate allocation might pull 200 women and 200 men to support an equally precise comparison between the sexes. The resulting sample is no longer representative of the population without weights, but the subgroup precision is higher.

The Two Main Allocations: Proportional vs Disproportionate

After the strata are defined, the researcher decides how many sample units to allocate to each stratum. Two allocation rules dominate practice.

Proportional allocation

Each stratum is sampled at the same rate. If the overall sample is 400 from a population of 10,000, the sampling fraction is 1-in-25 in every stratum. A stratum that is 30% of the population contributes 30% of the sample. The result is a sample whose composition reflects the population. Estimates of population-wide parameters can be computed by pooling without weighting, because the design effectively assigns equal weights to every population member.

Proportional allocation is the default in many surveys when the goal is national or population-wide estimates. It produces precise pooled estimates and acceptable (though not maximally precise) subgroup estimates. The precision gain over simple random sampling depends on how internally homogeneous and externally distinct the strata are.

Disproportionate allocation

Different strata are sampled at different rates, typically to oversample small but important subgroups. If a population is 95% white and 5% Black, proportional allocation in a 400-person sample produces 380 white and 20 Black respondents. The Black subsample is too small to support precise subgroup estimates. A disproportionate allocation might pull 250 white and 150 Black respondents, deliberately oversampling the smaller group. Total sample size and total cost are similar, but the precision of the Black-respondent estimates is much higher.

The cost of disproportionate allocation is that population-wide estimates from the sample are no longer unweighted. Each respondent represents a different share of the population, so sampling weights must be applied in the analysis to recover the population mean correctly. Without weights, the population-wide estimates will be biased toward the oversampled subgroup. Most national surveys that publish race or ethnicity subgroup estimates use disproportionate allocation with weighting applied.

How to Conduct Stratified Sampling

The five steps below produce a defensible stratified sample. Each step has practical decisions that affect the design.

Step 1: Define the population and identify strata

As in any probability sampling design, define the target population precisely. Then identify the strata you'll use. The strata must be mutually exclusive (every member belongs to exactly one) and collectively exhaustive (every member belongs to some stratum). For multiple stratification variables (sex and age group, for example), the strata are the cross-classification (men 18-29, men 30-44, women 18-29, women 30-44, and so on).

Step 2: Choose the stratification variable

The variable used to form strata should predict the outcome of interest. The stronger the prediction, the larger the precision gain over simple random sampling. If you're studying salary, occupation predicts salary strongly, so occupation makes a good stratification variable. If you're studying voter turnout, age group predicts turnout strongly. If you're studying something the variable doesn't predict, stratifying on it adds complexity without gaining precision.

Step 3: Decide on the allocation

Use proportional allocation when the goal is precise population-wide estimates and your stratification variable produces strata that are roughly comparable in size. Use disproportionate allocation when small subgroups need oversampling for defensible subgroup-level analysis. The decision is driven by whether you'll report population-wide estimates, subgroup-specific estimates, or both, and what precision each requires.

Step 4: Sample randomly within each stratum

Within each stratum, draw a simple random sample of the size determined by the allocation. The random selection mechanism is the same as in simple random sampling: a complete frame for the stratum, a random number generator, a documented seed for reproducibility. Each stratum is sampled independently of the others.

Step 5: Apply sampling weights in the analysis

When the strata are sampled at unequal rates, each respondent represents a different share of the population. Sampling weights adjust the analysis so that population-wide estimates are correct. The weight for each respondent is the inverse of their selection probability: if a respondent had a 1-in-50 chance of being selected, their weight is 50. Most statistical software supports survey weights through specialized functions: the survey package in R, the statsmodels survey functions in Python, and the svy prefix in Stata. Analyses without weights treat the sample as if all units had been drawn with equal probability, which biases population-wide estimates in the direction of the oversampled strata.

Reviewing your methods section before your defense?

Editor World's dissertation editors hold advanced degrees in their fields and routinely review sampling designs across the social and health sciences. They check that the strata, allocation rule, and weighting strategy are described correctly, and that the inferential claims match what the design supports.

Explore Dissertation Editing

Choosing the Right Stratification Variable

Three criteria guide the choice of stratification variable.

The variable must be available for every population member in the frame. Stratification requires that you know each member's stratum before sampling. Variables collected during the survey itself cannot serve as stratification variables; the information has to come from the frame.

The variable should predict the outcome of interest. Stratification's precision gains depend on the within-stratum variance being smaller than the population variance, which only holds when the stratification variable predicts the outcome. Stratifying on a variable unrelated to the outcome adds complexity without precision improvement.

The number of strata should be manageable. Too few strata fail to capture the heterogeneity that makes stratification worthwhile. Too many strata produce small within-stratum samples and make weighted analysis cumbersome. In practice, five to fifteen strata is common, depending on the population size and the design effect.

When multiple variables could serve, the choice often combines them through cross-stratification. A study of educational outcomes might stratify by school type (public, private, charter) and by urbanicity (urban, suburban, rural), producing nine strata in the cross. Cross-stratification captures more heterogeneity but multiplies the number of strata, so it works best when the population is large enough to support adequate within-stratum sample sizes.

Real-World Examples of Stratified Sampling

NHANES: stratified sampling at scale

The National Health and Nutrition Examination Survey uses a stratified multistage probability design. Strata are defined by combinations of region, urbanization, and demographic characteristics. Age, race, ethnicity, and income groups are oversampled to support precise subgroup health estimates. The resulting weighted analyses produce defensible national prevalence estimates for chronic conditions across demographic subgroups, with margins of error that researchers and policymakers can rely on. NHANES's stratification is the reason the CDC can report obesity prevalence by race-ethnicity with confidence intervals that mean what they claim.

Pew Research political polling: stratified random-digit dialing

Pew Research's national political polls stratify the random-digit-dialing frame by region of the country and demographic characteristics of the area. Stratification ensures that the sample includes appropriate representation from the Northeast, Midwest, South, and West, and from urban, suburban, and rural areas. Demographic post-stratification adjustments correct for any remaining imbalance against Census benchmarks. The combination produces national estimates with documented margins of error around plus or minus 3 percentage points on a sample of roughly 1,500 respondents.

Educational research: stratified school sampling

A study of teacher retention in a state's public schools might stratify the school frame by urbanicity (urban, suburban, rural) and by district type (high-poverty, mid-poverty, low-poverty). Each stratum is then sampled independently, with disproportionate allocation oversampling high-poverty schools where retention rates are most variable and policy interest is highest. Weighted analysis produces state-level retention estimates and subgroup estimates that support both policy reporting and academic publication.

When Stratified Sampling Isn't Helpful

Stratification adds value only when the stratification variable predicts the outcome. Three situations work poorly.

No frame-level variable predicts the outcome. If the only variables known for every population member are arbitrary, stratification can't improve on simple random sampling. Examples include alphabetical order of last name, employee ID number, or file location in a registry. The added design complexity isn't worth the implementation cost.

The population is too small for adequate within-stratum samples. A 100-person population divided into 10 strata produces strata of about 10 each, which is usually too few to sample from without exhausting the stratum. For small populations, simple random sampling is often the right choice even when subgroup analysis is planned.

The strata are not internally homogeneous. If the within-stratum variance is as large as the population variance, the precision gain from stratification disappears. This happens when the stratification variable doesn't really predict the outcome, despite being theoretically reasonable.

Common Mistakes

Ignoring weights in the analysis. The most common error in stratified sampling is analyzing the data as if it had come from a simple random sample. Without sampling weights, estimates from a disproportionately allocated stratified sample are biased toward the oversampled strata, and standard errors are incorrect. Survey-specific analysis functions in R, Python, and Stata are designed to handle this; using basic functions instead is a frequent dissertation error.
Stratifying on an irrelevant variable. Choosing a stratification variable because it's available rather than because it predicts the outcome. The design becomes more complex without producing any precision gain.
Stratifying on a variable not in the frame. If you can't identify each population member's stratum before sampling, you can't stratify on it. Stratification has to happen at the design stage, not after the data have been collected.
Over-stratifying. Creating so many strata that within-stratum sample sizes are tiny. The design effect becomes hard to manage and weighted analysis becomes unwieldy. Five to fifteen strata is the typical range for most studies.
Confusing stratified sampling with quota sampling. Stratified sampling uses random selection within strata. Quota sampling sets target numbers per stratum but selects non-randomly within them (often by convenience). The two designs look similar but produce fundamentally different inferential possibilities.

Self-Audit Checklist for Stratified Sampling

Before you submit a manuscript or defend a dissertation that uses stratified sampling, work through this checklist. Yes to each indicates your method is documented at the standard reviewers expect.

Have I defined the target population precisely?
Have I named the stratification variable and justified why it predicts the outcome of interest?
Have I specified whether allocation was proportional or disproportionate, and explained the choice?
Have I reported the sampling fraction for each stratum (or the within-stratum sample sizes)?
Have I applied sampling weights in the analysis, using survey-specific functions in my statistical software?
Have I documented how sampling weights were computed and verified that they sum to the population size?
Have I confirmed that the strata are mutually exclusive and collectively exhaustive?
Have I limited my inferential claims to the population the design covers, distinguishing between population-wide estimates and subgroup estimates where the precision differs?

Frequently Asked Questions

What is stratified sampling?

Stratified sampling is the probability sampling method that divides the target population into non-overlapping subgroups called strata, then draws a random sample within each stratum. The strata are defined by a variable known for every population member, such as sex, age group, region, or school district. Within each stratum, the researcher draws a simple random sample. Stratified sampling produces more precise estimates than simple random sampling when the strata are internally homogeneous and externally different. It also guarantees representation of small but important subgroups.

What is the difference between stratified and simple random sampling?

Simple random sampling draws units from the population as a whole with equal probabilities. Stratified sampling first divides the population into subgroups (strata) and draws random samples within each. Stratified sampling produces more precise estimates when the strata predict the outcome of interest, and it guarantees adequate representation of small subgroups that random selection might otherwise miss. The cost of stratified sampling is added complexity: sampling weights are required in the analysis whenever the strata are sampled at unequal rates.

What is the difference between proportional and disproportionate allocation in stratified sampling?

Proportional allocation samples each stratum at the same rate, so the sample composition mirrors the population composition. Disproportionate allocation samples some strata at higher rates than others, typically to oversample small subgroups that need adequate sample sizes for subgroup-level analysis. Proportional allocation produces precise population-wide estimates without weighting in basic analyses. Disproportionate allocation requires sampling weights to recover correct population-wide estimates because each respondent represents a different share of the population.

When should you use stratified sampling instead of simple random sampling?

Use stratified sampling when the research question involves comparisons across subgroups, when small but important subgroups exist in the population, or when you have a stratification variable that predicts the outcome of interest. Stratified sampling produces more precise estimates than simple random sampling under those conditions. Use simple random sampling when no useful stratification variable is available, when the population is too small to support adequate within-stratum samples, or when the added design complexity isn't worth the marginal precision gain.

How do you choose a stratification variable?

The stratification variable should be known for every population member in the sampling frame, should predict the outcome of interest, and should produce a manageable number of strata. Variables that satisfy all three criteria include sex, age group, region, school district, occupation, and income bracket, depending on the research context. The strength of the precision gain depends on how strongly the variable predicts the outcome. Stronger prediction means more internally homogeneous strata and a larger reduction in the standard error of estimates.

Why does stratified sampling require sampling weights?

Sampling weights are required whenever the strata are sampled at unequal rates, which is the case in any disproportionate allocation. The weight for each respondent is the inverse of the selection probability, so a respondent in a stratum sampled at a 1-in-50 rate carries a weight of 50. Without weights, population-wide estimates are biased toward the oversampled strata, and standard errors are incorrect. Statistical software supports survey weights through specialized functions, such as the survey package in R, the statsmodels survey functions in Python, and the svy prefix in Stata.

What is the difference between stratified sampling and quota sampling?

Stratified sampling and quota sampling both set target numbers per subgroup, but they differ critically in the selection mechanism within subgroups. Stratified sampling uses random selection within each stratum, with known selection probabilities. Quota sampling sets target numbers per subgroup but selects non-randomly within them, often by convenience or judgment. Stratified sampling supports statistical inference to the population with documented confidence intervals. Quota sampling is a non-probability method that doesn't support those inferences in the same way.