Frequency Distributions and Histograms
A frequency distribution is the count of how often each value or range of values appears in a dataset. A histogram is the visualization of that frequency distribution for continuous data. Together they're the most fundamental tools for understanding what's actually in a dataset before any further analysis. They reveal the shape of the data, the center, the spread, the presence of outliers, and whether the distribution looks anything like what your statistical tests assume.
This guide covers what frequency distributions are and the types you'll encounter, what a histogram is and how it differs from a bar chart, how to choose bin widths and number of bins, what histograms reveal about your data, common distribution shapes, mistakes to avoid, and how to report histograms in APA format. For the broader framework that frequency distributions sit within, see the complete guide to descriptive statistics. For the decision framework on choosing the right visualization for your question, see the guide to box plots, scatter plots, and choosing the right visualization.
Quick Answer
Frequency distribution. A table or list showing how many observations fall into each value or range. Variants include simple frequency, cumulative frequency, relative frequency (proportions), and grouped frequency (for continuous data).
Histogram. A bar plot of a frequency distribution for continuous data. The horizontal axis shows the variable's values divided into bins; the vertical axis shows count or frequency. Bars touch each other because the data is continuous.
Histogram vs. bar chart. Histograms are for continuous data and have touching bars. Bar charts are for categorical data and have gaps between bars. This is one of the most common visualization mistakes in research papers.
What histograms reveal. Shape (symmetric, skewed), center (where most values cluster), spread (how wide the distribution is), modality (one peak or several), and outliers (isolated values in the tails).
What Is a Frequency Distribution?
A frequency distribution counts how often each value or range of values appears in a dataset. For a small dataset of test scores, the frequency distribution might be a simple table: how many students scored 85, how many scored 86, and so on. For a large dataset of continuous values, individual scores would produce a table with too many rows to read, so the values are grouped into bins (ranges).
Frequency distributions are the starting point for almost any analysis of new data. Before calculating a mean or running a statistical test, looking at how the data is distributed answers the most basic question: what does my data actually look like? The frequency distribution gives that answer in tabular form. The histogram gives the same answer visually.
Simple frequency
A simple frequency table lists each unique value alongside the count of how many times it occurs. This format works for categorical variables (gender, marital status, agreement level) and for discrete numeric variables with a limited range of values (number of children, hours of sleep rounded to whole hours).
Cumulative frequency
A cumulative frequency table adds a running total: the count for each value plus the counts for all values below it. Cumulative frequencies tell you how many observations fall at or below a given value. This is useful for percentile calculations and for answering questions like "how many students scored below 70?"
Relative frequency
A relative frequency table reports proportions or percentages rather than raw counts. Relative frequencies are easier to compare across samples of different sizes. A count of 50 means different things in a sample of 100 than in a sample of 5,000. The corresponding proportion (50% versus 1%) makes the comparison immediate.
Grouped frequency
For continuous variables, individual values aren't useful frequency categories because almost every observation has a different value. Grouped frequency tables divide the variable's range into bins (intervals) and count how many observations fall in each. A dataset of household incomes might be grouped into bins of $10,000 width: $0 to $9,999, $10,000 to $19,999, and so on. The grouped frequency table is what underlies the histogram.
What Is a Histogram?
A histogram is a bar plot of a grouped frequency distribution for continuous data. The horizontal axis shows the variable's values, divided into bins. The vertical axis shows the count or frequency in each bin. The bars touch each other along the horizontal axis because the underlying variable is continuous: there's no gap between one bin and the next.
Histograms are one of the most informative single plots a researcher can produce. They show the shape, center, and spread of the data at a glance. They reveal whether the data is symmetric or skewed, whether there's one peak or several, and whether outliers are pulling the tails. Any analysis of continuous data should begin with a histogram.
Histogram vs. bar chart
A histogram and a bar chart look similar, but they represent different kinds of data. A histogram represents continuous data, and its bars touch because there's no logical gap between adjacent bins. A bar chart represents categorical data, and its bars have gaps because the categories are distinct.
Confusing the two is one of the most common visualization mistakes in research papers. A plot of average income by region with touching bars is wrong: region is categorical, so the bars should have gaps. A plot of household income distribution with gaps between bars is also wrong: income is continuous, so the bars should touch. The touching-or-not-touching distinction is the visual signal of whether the underlying variable is continuous or categorical, and getting it right is part of good data communication.
Building a Histogram: The Binning Decision
The single most consequential choice in building a histogram is the bin width (or equivalently, the number of bins). Too few bins flatten interesting features of the distribution; too many bins produce a noisy plot where every observation looks like its own peak. The right choice depends on the sample size and the natural scale of the variable.
How many bins?
Statistical software defaults usually produce reasonable histograms, but the defaults aren't always right for your specific dataset. A few quick guidelines:
- Too few bins (5 to 7): the histogram looks blocky and may hide skewness, bimodality, or other features. Often the default for small samples.
- Too many bins (50+): the histogram looks jagged. Random fluctuations become visible as if they were meaningful structure. Common when researchers use the same bin width that worked for a much larger dataset.
- A reasonable starting point: roughly 10 to 25 bins for most datasets. Tune from there based on what the data shows.
Common binning rules
Three formulas give automatic bin-width suggestions and are implemented in most statistical software.
- Sturges' rule: the number of bins equals 1 plus log base 2 of the sample size. Works well for small to moderate samples drawn from approximately normal distributions. Tends to use too few bins for very large samples.
- Scott's rule: bin width equals 3.5 times the standard deviation divided by the cube root of the sample size. Performs well for normal-shaped data and adapts to spread.
- Freedman-Diaconis rule: bin width equals 2 times the interquartile range divided by the cube root of the sample size. Less sensitive to outliers because it uses the IQR rather than the standard deviation. Often the best default for real-world data that may be skewed.
Software defaults vary. R's hist function defaults to Sturges; Python's matplotlib uses 10 bins; SPSS uses its own rule. When sharing a histogram, always note the bin width and the number of bins used so the reader knows what's been plotted.
What Histograms Reveal About Your Data
A well-constructed histogram answers most of the basic descriptive questions about a continuous variable in one image.
Shape
The overall shape of the histogram shows whether the distribution is symmetric or skewed. Symmetric distributions have similar tails on both sides; skewed distributions have one long tail. Positively skewed distributions (long right tail) are common for financial and behavioral variables. Negatively skewed distributions (long left tail) appear in test scores on easy assessments and similar bounded measures. For deeper coverage of shape statistics, see Editor World's guide to skewness and kurtosis.
Center
The center of the histogram shows where most values cluster. For approximately symmetric distributions, the center is close to the mean. For skewed distributions, the mean and the peak diverge because the long tail pulls the mean toward it. The histogram makes this visible in a way that the summary statistic alone cannot.
Spread
The horizontal range of the histogram shows how spread out the values are. A narrow histogram indicates values clustered tightly around the center. A wide histogram indicates values scattered broadly. The spread visible in the histogram corresponds to the standard deviation and the interquartile range, both of which are useful summary statistics for variability.
Modality
Modality describes how many peaks the distribution has. A unimodal distribution has one peak. A bimodal distribution has two peaks. A multimodal distribution has more than two. Most distributions in research are unimodal. Bimodal distributions often signal that the data is actually a mixture of two underlying populations (for example, heights of adults in a sample that includes both men and women). Detecting this through the histogram is often the first step in deciding whether to analyze the groups separately.
Outliers
Isolated bars at the extreme ends of the histogram suggest potential outliers. A histogram with a small bar far from the main body of the distribution flags values that may need investigation. Whether to treat them as errors, valid extreme cases, or normal tail observations depends on substantive context. The histogram makes their presence visible; the decision about what to do with them requires judgment. Pairing the histogram with z-scores can sharpen the outlier assessment: see the guide to z-scores for the standardized approach.
Common Histogram Shapes
Real-world variables tend to follow a small number of recognizable shapes. Learning to identify them from a histogram makes downstream analysis decisions faster.
- Bell-shaped (approximately normal). Symmetric, single peak in the center, tails falling off smoothly on both sides. Common for measurement errors, adult heights within a population, IQ scores, and many biological variables. See Editor World's guide to the normal distribution for the properties of this reference distribution and what they imply for analysis.
- Right-skewed (positively skewed). Most observations cluster at lower values with a long tail extending to the right. Typical for income, wealth, household assets, reaction times, length of hospital stay, and most variables with a natural lower bound at zero. Researchers studying financial behavior, including Fisher and Yao (2017) and others using the Survey of Consumer Finances, routinely see right-skewed histograms for almost every monetary variable.
- Left-skewed (negatively skewed). Most observations cluster at higher values with a long tail extending to the left. Less common than right-skew. Appears in test scores on easy assessments, satisfaction ratings near a ceiling, and age at death in wealthy modern populations.
- Bimodal. Two distinct peaks. Often signals that the sample contains two underlying subgroups with different typical values. Adult heights in a mixed-sex sample show a bimodal histogram because men and women have different mean heights.
- Uniform. Roughly equal frequencies across the range. Rare in real-world continuous data but does occur in some bounded measurements, simulation studies, and specific kinds of survey responses.
- Truncated or bounded. Cuts off sharply at one or both ends because of the variable's natural limits or a measurement ceiling/floor. A test where everyone scores at least 60 because the test is too easy will produce a histogram with a sharp left boundary at 60. This pattern often signals measurement issues worth investigating.
Common Mistakes in Histograms
- Treating a histogram as a bar chart. Putting gaps between the bars when the data is continuous. The visual signal becomes wrong: readers see the gaps and read the data as categorical. Always allow bars to touch when the underlying variable is continuous.
- Using the wrong number of bins. Software defaults aren't always appropriate. Too few bins hide features. Too many bins manufacture noise. Inspect the histogram and adjust the bin count if the shape doesn't make sense.
- Forgetting to label the axes. A histogram without axis labels is unreadable. The horizontal axis needs the variable name and units. The vertical axis needs "Frequency" or "Count" (or "Proportion" / "Relative frequency" for normalized histograms).
- Comparing histograms with different bin widths. Two histograms of similar data plotted with different bin widths look very different. When comparing across groups or samples, use the same bin width on every histogram.
- Reading too much into small samples. A histogram of 20 observations can look very different from the underlying distribution it was drawn from. Small-sample histograms are noisy by nature. Don't over-interpret minor features.
- Truncating the vertical axis. Starting the count axis above zero exaggerates small differences and is generally considered poor practice in scientific reporting. The vertical axis on a histogram should start at zero.
- Using a histogram for categorical data. Categorical variables (gender, region, education level) need a bar chart, not a histogram. A histogram of categorical data isn't statistically meaningful.
Histograms and Related Visualizations
Histograms are the standard visualization for the distribution of a single continuous variable, but they're not the only one. Each related plot has strengths the histogram lacks.
- Density plots smooth the histogram into a continuous curve. They avoid the binning decision entirely and are useful for comparing distributions across groups overlaid on the same axes. The trade-off is that they hide the actual sample size and can over-smooth real features.
- Dot plots show every observation as a separate dot stacked at its value. They preserve the raw data more faithfully than histograms but become unreadable above a few hundred observations.
- Stem-and-leaf plots combine the visual structure of a histogram with the precision of a data table. They're rare in published research but useful for small datasets when presenting raw values matters.
- Box plots summarize the distribution by its median and quartiles. They lose some information about the shape but compact the distribution into a single compact figure, which makes them excellent for comparing many groups on one plot.
Reporting Histograms in APA Format
When a histogram appears in a manuscript, it's formatted as a figure with a clear caption. The caption identifies the variable, the sample size, and any relevant details about the binning or transformation.
A typical caption format: "Figure 1. Histogram of self-reported anxiety scores (BAI) for the full sample (N = 248). Bin width = 4."
When the histogram is used as a diagnostic for a downstream test (assessing normality before a t-test, for example), a brief mention in the text confirms the inspection: "Inspection of histograms (Figure 1) suggested that the dependent variable was approximately normally distributed within each group."
For frequency distribution tables, APA conventions follow the standard table format with clear column headers (Value or Range, f, %, Cumulative f). Include a note line below the table explaining any abbreviations or rounding rules.
When Professional Editing Helps
Visualization reporting is one of the places where small errors in figure captions, axis labels, and the histogram-versus-bar-chart distinction undermine reviewer confidence. Editor World's academic editing services include review of figure captions, statistical notation, and the substantive accuracy of methodological claims. The same standard is applied across dissertation editing, journal article editing, and essay editing. 100% human editing, no AI at any stage. You choose your own editor from verified profiles, and a free sample edit is available before you commit. Browse available editors by subject expertise to find someone whose background matches your field.
Frequently Asked Questions About Frequency Distributions and Histograms
What is a frequency distribution?
A frequency distribution is a table or list showing how often each value or range of values occurs in a dataset. Variants include simple frequency (count for each value), cumulative frequency (running total of counts), relative frequency (proportions or percentages rather than raw counts), and grouped frequency (counts within ranges of values, used for continuous variables). The frequency distribution is the starting point for understanding what's actually in a dataset before any further analysis. A histogram is the visualization of a grouped frequency distribution for continuous data.
What is the difference between a frequency distribution and a histogram?
A frequency distribution is a table of counts. A histogram is the bar-plot visualization of a frequency distribution for continuous data. The two contain the same information presented in different forms. The table is precise and supports calculations. The histogram is faster to interpret visually and makes the shape of the distribution immediately apparent. Most analyses use both: a histogram for visual inspection and a frequency table (or summary statistics derived from one) for the manuscript.
What is the difference between a histogram and a bar chart?
A histogram represents continuous data, and its bars touch each other along the horizontal axis because there's no gap between adjacent bins. A bar chart represents categorical data, and its bars have gaps between them because the categories are distinct. Confusing the two is one of the most common visualization mistakes in research papers. A plot of average income by region needs bars with gaps because region is categorical. A plot of the distribution of income needs bars that touch because income is continuous. The touching-or-not-touching distinction signals whether the underlying variable is continuous or categorical.
How many bins should a histogram have?
A reasonable starting point is 10 to 25 bins for most datasets, with adjustments based on what the histogram reveals. Too few bins (5 to 7) flatten interesting features. Too many bins (50 or more) produce a noisy histogram where random fluctuations look like meaningful structure. Three formulas give automatic suggestions: Sturges' rule (1 plus log base 2 of the sample size), Scott's rule (3.5 times the standard deviation divided by the cube root of the sample size), and the Freedman-Diaconis rule (2 times the interquartile range divided by the cube root of the sample size). Freedman-Diaconis is often the most robust default for real-world data.
What does a histogram tell you about your data?
A histogram reveals the shape, center, spread, modality, and outliers of a continuous variable in a single image. Shape shows whether the distribution is symmetric or skewed. Center shows where most values cluster. Spread shows how widely the values are scattered. Modality shows how many peaks the distribution has (one peak is unimodal, two is bimodal, more is multimodal). Isolated bars in the tails suggest potential outliers worth investigating. Any analysis of continuous data should begin with a histogram.
What does a bimodal histogram mean?
A bimodal histogram has two distinct peaks rather than a single one. This often signals that the data is actually a mixture of two underlying subgroups with different typical values. Heights of adults in a mixed-sex sample produce a bimodal histogram because men and women have different mean heights. Test scores in a class with two distinct subgroups (high performers and low performers) can also produce a bimodal pattern. Detecting bimodality through the histogram is often the first step in deciding whether to analyze the groups separately rather than treating them as a single sample.
Can you make a histogram for categorical data?
No. Categorical variables (gender, region, education level, agreement responses) need a bar chart, not a histogram. The bars of a bar chart have gaps between them to signal that the categories are distinct and discrete. The bars of a histogram touch each other to signal that the underlying variable is continuous. Plotting categorical data as a histogram, with touching bars, is statistically misleading and would suggest a numeric ordering that doesn't exist.
How do you report a histogram in APA format?
Histograms are formatted as figures with clear captions. A typical caption identifies the variable, the sample size, and any relevant details about binning or transformation: "Figure 1. Histogram of self-reported anxiety scores for the full sample ( N = 248). Bin width = 4." When a histogram serves as a diagnostic for a downstream test, a brief mention in the text confirms the inspection. For frequency distribution tables, follow the standard APA table format with clear column headers for value or range, frequency, percentage, and cumulative frequency, and include a note line explaining any abbreviations or rounding rules.
Content reviewed by Editor World editorial staff. Editor World, founded in 2010 by Patti Fisher, PhD, graduate of The Ohio State University, provides professional editing and proofreading services for academic researchers, doctoral candidates, faculty, business professionals, and authors worldwide. BBB A+ accredited since 2010 with 5.0/5 Google Reviews and 5.0/5 Facebook Reviews. More than 100 million words edited for over 8,000 clients in 65+ countries. Stevie Award winner: Gold 2019, Bronze 2018 and 2025. Native English editors from the United States, the United Kingdom, and Canada with subject-matter expertise across the social sciences, the natural and physical sciences, medicine, engineering, computer science, and the humanities. 100% human editing, no AI at any stage. Less than 5% of applicants are accepted to the editor panel. Recommended by the Boston University Economics Department, University of San Diego, University of Michigan, UCLA, University of Missouri, and more.