Cluster Sampling: When and How to Use It

Cluster sampling is the probability sampling method that selects whole groups (clusters) at random and then studies the population members within those groups. It's the practical choice when the target population is geographically dispersed or when no individual-level sampling frame exists. The trade-off is statistical precision. Cluster samples typically have wider confidence intervals than simple random samples of the same size because units within a cluster tend to be similar to each other. Cluster sampling earns its place in most large-scale survey research, where the alternative isn't a simple random sample but no sample at all.

This article defines cluster sampling and explains when it's the right design. It walks through single-stage and multistage variants, covers the design effect that distinguishes cluster samples from simple random samples, and gives the implementation steps. For the broader sampling context, see our probability sampling overview. For the conceptual baseline, see simple random sampling. For the methodology pillar, see our research methodology guide.

Quick Answer: What Is Cluster Sampling?

Cluster sampling selects groups (clusters) at random rather than individuals. Once clusters are selected, the researcher either studies every member of the chosen clusters (single-stage cluster sampling) or draws a random sample within them (multistage cluster sampling). Clusters are typically natural groupings: counties, schools, hospitals, neighborhoods, census tracts. Cluster sampling is the practical choice when the population is geographically dispersed and no individual-level frame exists. The trade-off is the design effect: cluster samples have larger standard errors than simple random samples at the same sample size because units within a cluster correlate. Cluster sampling is fundamentally different from stratified sampling, even though both involve subgroups. Stratified sampling samples within every subgroup; cluster sampling samples only some subgroups.

What Cluster Sampling Is

Cluster sampling has three required components. First, the population is organized into mutually exclusive groups called clusters, which are usually natural groupings (geographic areas, institutions, household units). Second, clusters are sampled at random from a complete list of all clusters in the population. Third, members of the sampled clusters are either all included (single-stage) or randomly subsampled (multistage). Population members in unsampled clusters are not in the study, which is the structural feature that distinguishes cluster sampling from stratified sampling.

The clusters themselves are called primary sampling units, or PSUs. The members within clusters are called secondary sampling units or simply elements. In multistage designs, the clusters at each stage are usually called by stage: primary sampling units, secondary sampling units, tertiary sampling units. The terminology matters because peer reviewers and dissertation committees use it precisely, and methods sections that use it correctly read as more competent.

Why Use Cluster Sampling?

Cluster sampling is rarely the design researchers would choose if statistical precision were the only concern. Simple random sampling produces tighter confidence intervals at the same total sample size, and stratified sampling does too when the strata are informative. Cluster sampling earns its place through two practical advantages.

No individual-level frame exists. For most large-population studies, no master list of individuals is available. There is no national list of every U.S. adult, every K-12 student, every hospital patient, every voter. There are, however, lists of administrative units: counties, school districts, hospitals, voting precincts. Cluster sampling uses those administrative units as the sampling frame, which is what makes large-scale national research feasible at all.

Cost concentration. Even when an individual-level frame exists, cluster sampling can be far cheaper to execute. A nationwide simple random sample requires data collection in hundreds of locations. A cluster sample of 50 counties concentrates the fieldwork. Interviewers, equipment, and supervision are deployed to a small number of sites rather than spread across the country. For face-to-face surveys, household visits, or in-person assessments, the cost savings are substantial.

Single-Stage vs Multistage Cluster Sampling

Cluster sampling comes in two main forms. The difference is whether every member of a sampled cluster is included or only some.

Single-stage cluster sampling

After clusters are sampled at random, every member of each sampled cluster is included in the study. A study of teacher practices might randomly sample 30 elementary schools and then survey every teacher in those schools. Single-stage designs work when the clusters are small enough that complete enumeration is feasible. The advantage is operational simplicity: once the cluster is chosen, no further sampling decisions are needed.

Multistage cluster sampling

After clusters are sampled, the researcher draws a random sample of members within each sampled cluster. A national health survey might sample counties at the first stage, then census blocks within counties, then households within blocks, then adults within households. Each stage involves a separate random selection. Multistage designs let researchers control total sample size more precisely. They're nearly universal in large national surveys, where complete enumeration of selected counties or census tracts would be impossibly expensive.

The Design Effect: Cluster Sampling's Trade-Off

Cluster samples produce wider confidence intervals than simple random samples of the same size. The reason is that units within a cluster correlate. Students in the same school share teachers, curricula, and neighborhood characteristics. Households in the same neighborhood share local conditions. The correlation among within-cluster units is called the intracluster correlation, or ICC, often written as the Greek letter rho.

The quantitative effect is captured by the design effect (DEFF). For a cluster sample with average cluster size m and intracluster correlation rho, the design effect is approximately:

DEFF = 1 + (m - 1) * rho

A design effect of 2.0 means the cluster sample produces standard errors twice the size of a simple random sample at the same total sample size. To achieve the same precision as a simple random sample, the cluster sample needs to be twice as large. Common ICCs in social and health research range from 0.01 to 0.10, which produces design effects between 1.5 and 5 depending on cluster size. Reporting the design effect is standard in survey methodology and increasingly expected in dissertation defenses.

How to Conduct Cluster Sampling

The five steps below produce a defensible cluster sample. Each has practical decisions that affect the design.

Step 1: Define the population and identify natural clusters

Define the target population precisely. Then identify the natural clusters: groupings that are mutually exclusive, collectively exhaustive, and ideally similar in size. Common clusters include counties, school districts, hospitals, neighborhoods, census tracts, and households. The choice of cluster shapes the entire design, so it deserves explicit justification in the methods section.

Step 2: Construct a cluster-level sampling frame

Build a list of every cluster in the population. For U.S. studies, county and census tract lists are publicly available. School district lists are maintained by state departments of education. Hospital lists are maintained by Medicare. Document the source of your cluster frame and any gaps between the frame and the target population.

Step 3: Determine cluster count and within-cluster sample size

Two sample size decisions are required. How many clusters to sample, and how many units to draw from within each. The balance between these two depends on the design effect. For a fixed total sample size, more clusters with fewer members per cluster reduce the design effect, but increase fieldwork cost. Fewer clusters with more members per cluster increase the design effect, but reduce cost. The optimal balance depends on the ICC and the cost ratio.

Step 4: Sample clusters at random

Use a random selection mechanism to draw clusters from the frame. Clusters can be sampled with equal probability or with probability proportional to size (PPS), where larger clusters have higher selection probabilities. PPS designs produce more efficient estimates when cluster sizes vary substantially, and they are standard in major national surveys. Document the selection mechanism, the random seed if any, and the resulting cluster sample.

Step 5: Sample within clusters (or census within)

For single-stage designs, include every member of each sampled cluster. For multistage designs, draw a random sample within each cluster using the same probability sampling principles. Sampling weights are required in the analysis to account for the unequal selection probabilities the design produces. The survey package in R, the statsmodels survey functions in Python, and the svy prefix in Stata all support cluster sample analysis with appropriate weights and design effects.

Reviewing your methods section before submission?

Editor World's dissertation editors hold advanced degrees in their fields and routinely review complex sampling designs. They check that the cluster definition, primary sampling units, design effect, and weighting strategy are described correctly, and that the inferential claims match what the design supports.

Explore Dissertation Editing

Cluster Sampling vs Stratified Sampling

Cluster sampling and stratified sampling are easy to confuse because both involve subgroups. The sampling logic is opposite, though, and the methodological consequences differ sharply.

Stratified sampling divides the population into subgroups and samples within every subgroup. Every stratum contributes to the sample. The design improves precision because the strata are designed to be internally homogeneous.

Cluster sampling divides the population into subgroups and samples only some of them. Many clusters contribute zero observations to the sample. The design reduces cost but tends to reduce precision because clusters often contain similar members.

A useful way to remember the distinction: stratified sampling wants the strata to be different from each other so that internal precision is high. Cluster sampling wants the clusters to be similar to the overall population so that any single cluster, in aggregate, looks like the population in miniature. The two designs solve different problems.

Real-World Examples of Cluster Sampling

NHANES: multistage cluster sampling at the national level

The National Health and Nutrition Examination Survey uses a four-stage probability cluster design. Primary sampling units are counties or county clusters. Within sampled counties, the second stage selects census segments. The third stage selects households within segments. The fourth stage selects individuals within households. Strata defined by age, sex, race, ethnicity, and income are oversampled to support subgroup precision. The total sample is roughly 5,000 individuals per cycle, drawn from a sequence of clusters that makes national fieldwork feasible.

Demographic and Health Surveys: international cluster sampling

DHS surveys in low-income and middle-income countries use multistage cluster designs with census enumeration areas (EAs) as the primary sampling units. A typical DHS draws 200 to 500 EAs, stratifying by region and urban-rural status. Within each EA, a household listing is conducted, and 20 to 30 households are randomly selected. DHS produces nationally representative estimates of fertility, child mortality, nutrition, and HIV prevalence across more than 90 countries, all because cluster sampling makes the fieldwork affordable.

Educational research: schools as primary sampling units

Studies of student achievement, teacher practices, and school effects almost always use cluster sampling with schools as the PSUs. A state-level study of mathematics achievement might sample 80 schools at random, then sample 25 students per school. The intracluster correlation among students in the same school is typically 0.10 to 0.20, producing design effects of 3 to 5 depending on cluster size. International assessments like PISA and TIMSS use this design at scale, sampling schools across countries and students within schools.

When Cluster Sampling Isn't Helpful

Cluster sampling is the wrong design in three common situations.

An individual-level frame exists and cost is not a concern. If you can list every population member and the population isn't geographically dispersed, simple random sampling or stratified sampling will produce tighter confidence intervals at the same total sample size.

The clusters are very homogeneous internally. When the ICC is high (above about 0.20), the design effect grows quickly and the cluster sample loses much of its statistical efficiency. In this case, sampling more clusters with fewer members each (or switching to a different design) is often the better path.

The research question requires precise within-cluster comparisons. Cluster sampling is optimized for population-wide estimates, not for comparing one cluster to another. Studies that focus on a small number of named institutions may do better with a census of those institutions rather than a cluster sample drawn from a larger frame.

Common Mistakes

Ignoring the design effect in analysis. Analyzing cluster-sampled data as if it had been drawn by simple random sampling produces standard errors that are too small and confidence intervals that are too narrow. Survey-specific analysis software exists for exactly this reason, but students sometimes use basic statistical functions instead.
Confusing cluster sampling with stratified sampling. Both involve subgroups, but the logic is opposite. Stratified sampling samples within every subgroup. Cluster sampling samples only some subgroups. Methods sections that confuse the two read as methodologically weak.
Selecting clusters with unequal probability without applying weights. Probability-proportional-to-size (PPS) sampling produces unequal selection probabilities by design. Analyses that ignore this bias the estimates toward the larger clusters.
Treating cluster-sampled data as independent observations. Linear regression, t-tests, and ANOVA assume independent observations. Cluster-sampled data violate this assumption. Multilevel models, survey-weighted regression, or generalized estimating equations are designed to handle cluster effects correctly.
Reporting too few clusters. Cluster samples with very few PSUs (say, fewer than 20 to 30) can produce unstable design effect estimates and poor coverage of confidence intervals. Sample size planning for cluster designs requires attention to both the number of clusters and the number of units within them.

Self-Audit Checklist for Cluster Sampling

Before you submit a manuscript or defend a dissertation that uses cluster sampling, work through the checklist below. Yes to each indicates your method is documented at the standard reviewers expect.

Have I defined the target population precisely and justified the choice of cluster?
Have I described the cluster-level sampling frame and acknowledged any coverage gaps?
Have I specified single-stage or multistage and explained the rationale?
Have I reported the number of clusters and the average within-cluster sample size?
Have I reported the intracluster correlation and the design effect, where calculable?
Have I applied sampling weights and used survey-specific analysis functions for cluster-sampled data?
Have I distinguished my cluster sample from a stratified sample in the methods section?
Have I limited my inferential claims to the population the design covers?

Frequently Asked Questions

What is cluster sampling?

Cluster sampling is the probability sampling method that selects whole groups (clusters) at random and then studies the population members within those groups. Clusters are typically natural groupings: counties, school districts, hospitals, neighborhoods, census tracts. Cluster sampling is the practical choice when the target population is geographically dispersed or when no individual-level sampling frame exists. The trade-off is that cluster samples typically have wider confidence intervals than simple random samples at the same total sample size because units within a cluster correlate.

What is the difference between cluster sampling and stratified sampling?

Stratified sampling divides the population into subgroups and samples within every subgroup; every stratum contributes to the sample. Cluster sampling divides the population into subgroups and samples only some of them; many clusters contribute zero observations. Stratified sampling improves precision because strata are designed to be internally homogeneous and externally different. Cluster sampling typically reduces precision because clusters often contain similar members, but it makes large-scale research feasible when no individual frame exists.

What is the difference between single-stage and multistage cluster sampling?

In single-stage cluster sampling, every member of each sampled cluster is included in the study. In multistage cluster sampling, the researcher draws a random sample of members within each sampled cluster. Single-stage designs work when clusters are small enough that complete enumeration is feasible. Multistage designs let researchers control total sample size more precisely and are nearly universal in large national surveys where complete enumeration of selected clusters would be impossibly expensive.

What is the design effect in cluster sampling?

The design effect (DEFF) quantifies how much larger the standard errors of a cluster sample are compared to a simple random sample of the same total size. For a cluster sample with average cluster size m and intracluster correlation rho, the design effect is approximately 1 + (m - 1) * rho. A design effect of 2.0 means standard errors are twice as large as those of a simple random sample, so the cluster sample needs to be twice as large to achieve the same precision. Common ICCs in social and health research produce design effects between 1.5 and 5.

When should you use cluster sampling instead of simple random sampling?

Use cluster sampling when no individual-level sampling frame exists for your target population, when the population is geographically dispersed enough that simple random sampling would be prohibitively expensive, or when natural groupings exist that make cluster-based data collection substantially cheaper. The trade-off is reduced statistical efficiency: cluster samples have wider confidence intervals than simple random samples at the same total sample size. Cluster sampling earns its place when the alternative is no study at all, not a more efficient simple random sample.

What are primary sampling units (PSUs) in cluster sampling?

Primary sampling units are the clusters selected at the first stage of a cluster sampling design. In a study using counties as clusters, counties are the primary sampling units. The members within clusters are called secondary sampling units or simply elements. In multistage designs, the units at each subsequent stage are named by stage: secondary sampling units, tertiary sampling units, and so on. Peer reviewers and dissertation committees use this terminology precisely, so accurate use in the methods section signals methodological competence.

What is intracluster correlation?

Intracluster correlation (ICC), often written as the Greek letter rho, is the correlation among units within the same cluster. Students in the same school correlate because they share teachers, curricula, and neighborhood characteristics. Households in the same census tract correlate because they share local conditions. Common ICCs in social and health research range from 0.01 to 0.10. Higher ICCs produce larger design effects, which means cluster samples need to be larger to achieve the same precision as simple random samples.