Simple Random Sampling: Definition and How to Do It

Simple random sampling is the probability sampling method in which every member of a defined population has an equal probability of being selected. Each selection is independent of the others. It's the conceptual baseline against which every other sampling method is evaluated, and it's the method most introductory statistics is built on. This article defines simple random sampling precisely. It walks through a five-step process for conducting one correctly. It covers practical tools for the random number generation step. And it explains the technical distinctions (with replacement vs without, seed setting for reproducibility) that come up in dissertation defenses and peer review.

For the broader sampling cluster, see our probability sampling overview, which compares simple random sampling to its three siblings: stratified, cluster, and systematic sampling. For the underlying concept of sample and population, see our guide on population vs sample in research. For the broader methodology context, see our research methodology guide.

Quick Answer: What Is Simple Random Sampling?

Simple random sampling is the probability sampling method in which every member of the population has an equal probability of being selected for the sample. Each selection is independent of the others. The five steps to conduct one are: define the population, build the sampling frame, decide on sample size, number every member of the frame, and select using a random number generator. Simple random sampling without replacement is the most common variant; each member can be selected only once. With replacement, members can be selected more than once. Setting a random number seed makes the selection reproducible so that other researchers can verify the sample. Simple random sampling is the right choice when a complete frame exists and subgroup precision isn't a priority. When subgroups matter, use stratified sampling; when the population is dispersed and no individual-level frame exists, use cluster sampling.

What Simple Random Sampling Is

Simple random sampling requires three conditions. First, every member of the target population has the same probability of being selected. Second, that probability is greater than zero (no member is systematically excluded). Third, each selection is independent: choosing one member doesn't change the selection probabilities of the others (until without-replacement sampling has reduced the remaining pool by one).

The mathematical consequence of these conditions is that simple random sampling produces an unbiased estimator of the population mean (or any other population parameter). The standard error formulas taught in introductory statistics apply directly. There are no design weights to compute, no stratum adjustments to make, no cluster effects to account for. The statistical machinery is at its simplest, which is why this method is the conceptual baseline.

Why Simple Random Sampling Is the Statistical Baseline

Every introductory statistics textbook assumes simple random sampling under the hood. Confidence intervals, the central limit theorem, the t-test, the chi-square test, the F-test for ANOVA: all of them assume the data come from a simple random sample. When the sampling design is more complex (stratified, cluster, multistage), the same formulas no longer apply directly. The analysis must account for unequal selection probabilities through sampling weights, design effects, and specialized survey analysis software.

In practice, simple random sampling is used less often than stratified or cluster sampling in large-scale research, because real research questions usually involve subgroups or geographically dispersed populations. But simple random sampling is what every other method is benchmarked against. Stratified sampling claims to be more precise than simple random sampling. Cluster sampling claims to be cheaper but less precise. Those comparisons only make sense because simple random sampling is the reference point.

How to Conduct Simple Random Sampling

The five steps below produce a defensible simple random sample. Each step has practical decisions that affect whether the sample is what it claims to be.

Step 1: Define the target population

State precisely who the population is. Adults aged 18 and older in Cook County, Illinois, as of January 1, 2026. Faculty employed at Ohio State University with a primary appointment in any college, as of the fall 2026 semester. Patients diagnosed with type 2 diabetes at the Cleveland Clinic between 2018 and 2024. Population definition matters because it determines what your inference can claim. A sample of Cook County adults doesn't generalize to Illinois adults, and a sample of OSU faculty doesn't generalize to U.S. faculty.

Step 2: Build the sampling frame

The sampling frame is the operational list of every member of the population. A voter registration list. An HR database of current employees. A clinical registry of diagnosed patients. The frame is rarely identical to the target population. Some population members are missing from the frame (coverage error), and some non-members may be on the frame (eligibility error). Document the gap between frame and target population so reviewers know what your inference covers and what it doesn't.

Step 3: Decide on sample size

Sample size depends on the desired precision of your estimates, the expected variability of the outcome, and the level of statistical power you need for inferential tests. A formal sample size calculation is the standard approach in dissertation proposals and grant applications. Several free calculators are available online, and most statistical software packages include sample size functions. For most simple research questions, samples of 200 to 400 produce defensible confidence intervals. Larger or smaller numbers may be appropriate depending on effect sizes and the variance of the outcome.

Step 4: Generate the random selection

Assign every member of the sampling frame a unique number (1 through N, where N is the population size). Then generate a set of random numbers between 1 and N, of length equal to your desired sample size. Members whose numbers match the generated set are in the sample. The random number generation step is where most practical questions about implementation come up, and it's worth covering the common tools.

R. The base R function sample() handles this in one line. To draw 400 from a population of 8,000 without replacement: sample(1:8000, 400, replace=FALSE).

Python. Using NumPy, numpy.random.choice(8000, 400, replace=False) produces the same result. The standard library function random.sample(range(1, 8001), 400) works equally well for small to moderate populations.

Stata. The sample command draws a random sample from a dataset. Use sample 400, count to draw exactly 400 observations from the loaded dataset.

Excel. Use =RAND() in a new column to assign a random number to each member, sort by that column, and take the first 400. Excel's random number generator is acceptable for educational and small-scale uses but is generally considered weaker than R, Python, or dedicated statistical software for research applications.

Online generators. Sites like random.org provide true random numbers based on atmospheric noise. They're appropriate when you need a one-time selection without setting up a programming environment. Note that downloadable lists from random.org are not reproducible by seed setting in the same way as software-generated samples.

Step 5: Select members and document the process

Pull the population members whose numbers match the generated set. Document the date of the selection, the tool used, the seed (if any), and the random number sequence produced. This documentation is what allows another researcher (or you, six months from now) to verify the sample is what you said it was.

Setting a Seed for Reproducibility

Software-based random number generators are technically pseudo-random: they produce sequences that pass statistical tests for randomness but are deterministically generated from a starting value called a seed. If you set the seed before generating random numbers, anyone using the same software with the same seed will produce the same sequence. This means your sample becomes reproducible, which is a standard expectation in transparent research.

In R, set.seed(20260601) before the sample() call locks in reproducibility. In Python NumPy, numpy.random.seed(20260601) does the same. Choose the seed in advance and document it in your methods section. Seeds chosen after looking at the data, or changed until the sample produces a desired result, defeat the purpose entirely.

Need an editor for your dissertation's methods section?

Editor World's dissertation editors hold advanced degrees in their fields and routinely review sampling sections. They check that the population, frame, sample size justification, and random number tool are reported correctly, and that the inferential claims match what the sampling design supports.

Explore Dissertation Editing

With Replacement vs Without Replacement

Simple random sampling has two technical variants. With replacement means each member can be selected more than once: after each draw, the member is returned to the pool. Without replacement means each member can be selected only once; after each draw, the member is removed from the pool. In research practice, sampling is almost always done without replacement, because researchers don't want to interview the same person twice or count the same patient twice in a count.

The distinction matters statistically when the sample is a large fraction of the population. When the sampling fraction (sample size divided by population size) is small (under 5%), the two methods produce essentially the same estimates and standard errors. When the sampling fraction is large, sampling without replacement produces smaller standard errors than sampling with replacement, and an adjustment called the finite population correction should be applied. Most introductory statistics ignores this distinction because most sampling fractions in practice are small enough that it doesn't matter.

Real-World Examples of Simple Random Sampling

A faculty study at Ohio State University

A researcher studying financial literacy among university faculty draws a simple random sample of 400 from a complete employee directory of 8,000 faculty members at Ohio State University. The population is OSU faculty as of fall 2026. The frame is the HR database. Each faculty member has a 1-in-20 probability of selection. The researcher uses R, sets a seed of 20260901, and runs sample(1:8000, 400, replace=FALSE). This produces a list of 400 numbers, which the researcher matches to faculty IDs. Documentation includes the seed, the date of the draw, the software version, and the resulting sample. This is the kind of cleanly designed simple random sample that supports defensible inference back to the population of OSU faculty.

A clinical chart review at a medical center

A medical resident is studying medication adherence and draws a simple random sample of 200 patient records. The population is the 4,000 patients diagnosed with congestive heart failure at a single medical center between 2020 and 2024. The frame is the electronic health record. The probability of selection is 1-in-20 per patient. Drawing is done in Python with a documented seed. Inference is to the population of CHF patients at that medical center in that time window.

When Simple Random Sampling Is the Wrong Choice

Simple random sampling is rarely the optimal method for large, complex research. The three most common reasons to use a different method:

Subgroups matter. If you need defensible estimates for specific subgroups, stratified sampling almost always produces more precise estimates than simple random sampling at the same sample size. This applies whenever you care about men vs women, regions, ethnicities, or income brackets separately. Without stratification, small subgroups may be under-represented by chance, even when the overall design is correct.

No individual-level frame. If you're studying schoolchildren across a state, households in a country, or patients across a multi-site health system, no master list of individuals exists. Cluster sampling is the practical choice when the frame is at the group level (schools, neighborhoods, clinics) rather than the individual level.

Ordered frames where simple random sampling is overkill. If the frame is large and ordered, systematic sampling can produce results comparable to simple random sampling with much less implementation effort. Examples include a queue of customer records or a sequence of manufactured parts. The risk of systematic sampling is that periodic structure in the frame can produce bias, but for randomly ordered frames the methods are roughly equivalent.

Common Mistakes

Using "random" to describe a haphazard sample. Stopping people in a coffee shop or sending a survey to your professional contacts is not random sampling. Real random sampling uses a defined frame and a random number generator. Calling a convenience sample "random" is one of the most common methodological errors in student work.
Forgetting to set a seed. Without a documented seed, you cannot reproduce the sample, and another researcher cannot verify your selection. In dissertation defenses and high-quality journals, seed documentation is increasingly an expected detail.
Choosing the seed after looking at the data. A seed that's selected because it produces a "good" sample isn't a seed; it's a form of researcher bias. Choose the seed before drawing the sample and document the choice in advance.
Treating coverage error as sampling error. People who are missing from your frame (no phone, no email, not in the registry) are excluded from inference. This isn't a sampling problem solvable by drawing more units; it's a coverage problem that requires acknowledging the gap between your frame and your target population.
Selecting members after the random numbers are drawn but failing to follow through. If the protocol says you draw 400 names and try to contact them, then changing the selection because someone is hard to reach turns the design into a convenience sample. The sample is defined by the selection mechanism, not by who you actually manage to interview.

Self-Audit Checklist for Simple Random Sampling

Before you submit a manuscript or defend a dissertation that uses simple random sampling, work through this checklist. If you can answer yes to each, your method is documented at the standard reviewers expect.

Have I defined the target population precisely, including any restrictions on age, geography, time period, or other characteristics?
Have I described the sampling frame and named the gap between the frame and the target population?
Have I documented the tool used to generate random numbers and the seed (if any)?
Have I justified the sample size based on a formal calculation tied to the research question?
Have I specified whether the sampling was with or without replacement?
Have I noted the sampling fraction and applied a finite population correction if it's large?
Have I distinguished between the sampling design and the response rate?
Have I limited my inferential claims to the population the frame actually covers, rather than overgeneralizing?

Frequently Asked Questions

What is simple random sampling?

Simple random sampling is the probability sampling method in which every member of the target population has an equal probability of being selected for the sample. Each selection is independent of the others. It's the conceptual baseline against which other sampling methods are evaluated. The mathematical consequence of equal selection probabilities is that simple random sampling produces an unbiased estimator of population parameters. The standard error formulas taught in introductory statistics apply directly without weight adjustments.

How do you do simple random sampling?

The five steps are: define the target population precisely, build a sampling frame that lists every member of the population, decide on the sample size using a formal calculation, number each member of the frame, and generate random numbers to select sample members. Random number generation can be done in R using the sample function, in Python using numpy.random.choice, in Stata using the sample command, in Excel using the RAND function combined with sorting, or with an online generator like random.org. Document the tool, the seed (if any), and the resulting selection.

What is the difference between simple random sampling with and without replacement?

Sampling with replacement returns each selected member to the pool, so members can be selected more than once. Sampling without replacement removes each selected member from the pool, so each member can be selected only once. In research practice, sampling is almost always done without replacement. The statistical distinction matters when the sample is a large fraction of the population; with small sampling fractions (under 5 percent), the two methods produce essentially identical estimates.

Why is setting a seed important in simple random sampling?

Software-based random number generators produce sequences that are deterministic from a starting value called the seed. Setting and documenting the seed before drawing the sample means another researcher using the same software and the same seed can reproduce the exact sample, which is a basic expectation of transparent research. The seed must be chosen in advance, not selected after looking at the data, because changing the seed until the sample looks favorable defeats the purpose entirely.

When should you use simple random sampling instead of stratified or cluster sampling?

Use simple random sampling when a complete individual-level sampling frame exists, the population isn't heavily dispersed geographically, and subgroup precision isn't a priority. Stratified sampling produces more precise estimates than simple random sampling when subgroups matter, because it guarantees representation of small but important strata. Cluster sampling is the practical choice when the population is geographically dispersed and no individual-level frame exists, even though it typically has lower statistical precision at the same sample size.

Is simple random sampling the most common sampling method?

No. In large-scale research, stratified and cluster sampling are used more often than simple random sampling because real research questions usually involve subgroups or geographically dispersed populations. Simple random sampling is the conceptual baseline against which other methods are benchmarked, and it's most common in small-to-moderate research with a complete individual-level frame (a faculty directory, a patient registry, a customer database). Its prominence in textbooks reflects its role as the foundation of statistical inference, not its prevalence in applied work.

What is the difference between simple random sampling and random assignment?

Simple random sampling is the selection of participants from a population using a random mechanism with equal probabilities, which supports external validity (generalization from sample to population). Random assignment is the allocation of participants already in the study to experimental conditions using a random mechanism, which supports internal validity (causal inference about the effect of the treatment). The two procedures address different questions, and a study can use one without the other, both, or neither. For details on random assignment in experiments, see our guide on experimental research design.