Box Plots, Scatter Plots, and Choosing the Right Visualization for Your Data
Quick Answer
The decision tree.
One variable, continuous: histogram or box plot. One variable, categorical: bar chart. Two variables, both continuous: scatter plot. Two variables, one categorical and one continuous: box plot by group or grouped bar chart. Trends over time: line chart.
The two workhorses.
Box plots show you the spread, skew, and outliers in your data. Scatter plots show you the relationship between two continuous variables. If you only learn two charts, learn these.
The mistake to avoid.
Bar charts for continuous data. They hide everything that matters: the spread, the distribution shape, and the outliers.
Why Visualization Comes Before Any Test
Statistical tests assume things about your data. Most assume the data is roughly normal. Some assume equal variances across groups. Some are sensitive to outliers. You need to know whether your data meets these assumptions before you run any test. Visualization is how you find out.
A quick visual check often reveals patterns that summary statistics hide. The mean and standard deviation look reasonable. Then you plot the data and see that 30% of the values are clustered at zero with a long right tail. Now you know not to run a t-test without thinking about transformation. Or you see two distinct peaks instead of one. Now you know there are two subgroups in your sample. None of this shows up in M and SD alone.
The principle is simple: always plot your data before you analyze it. Even a quick histogram saves you from running the wrong test on data that violates its assumptions.
Histograms
Histograms show the distribution of a single continuous variable. Values are grouped into bins. The height of each bar shows how many observations fall in that bin.
What histograms reveal
- The shape of the distribution. Symmetric, skewed, bimodal, uniform.
- The center. Where most values cluster.
- The spread. How wide the range of values is.
- Outliers. Bars far from the main cluster.
When to use a histogram
Histograms are the first thing to make when you have a single continuous variable and want to understand its distribution. Test scores, ages, heights, income, reaction times. Any time you'd report a mean or median, a histogram should come first.
When not to use a histogram
Histograms don't work well for small samples. With fewer than 20 observations, you don't have enough data to fill the bins meaningfully. The shape will look ragged and won't reflect the underlying distribution. For small samples, a box plot or dot plot works better.
Common mistakes
- Wrong bin width. Too few bins hide the shape. Too many show only noise. Most software picks reasonable defaults, but check that your bins reveal the patterns you'd expect.
- Using a bar chart instead. Histograms have continuous x-axes. Bar charts have categorical ones. Don't confuse them.
- Comparing distributions on different histograms. If you want to compare two groups, overlay them on one histogram with transparency, or switch to box plots.
Box Plots
Box plots are one of the most informative single-variable visualizations. They compress five numbers into one compact display: the minimum, first quartile, median, third quartile, and maximum.
What box plots reveal
- The median. The line inside the box.
- The middle 50% of the data. The box itself runs from the 25th to the 75th percentile.
- Skewness. If the median sits closer to one end of the box, your data is skewed.
- Outliers. Individual points plotted beyond the whiskers, usually defined as values more than 1.5 times the interquartile range past the box.
- Group comparisons. Multiple box plots side by side show how different groups compare on the same variable.
When to use a box plot
Box plots are essential when you want to compare distributions across groups. They're also the right choice when your data has outliers or skew, since the box plot makes both visible at a glance. For any dataset where the standard deviation might mislead because of skew, a box plot shows the actual distribution better.
When not to use a box plot
Box plots can hide bimodality. A bimodal distribution and a uniform distribution can produce similar-looking box plots, but they tell very different stories about your data. If you suspect your data might have multiple peaks, use a histogram or violin plot instead.
Common mistakes
- Treating outliers as errors. Outliers marked on a box plot aren't necessarily mistakes. They're just unusual values. Investigate them before removing them.
- Not labeling the y-axis units. A box plot of "scores" is useless if readers don't know whether the scale is 0-10, 0-100, or 0-1000.
- Comparing too many groups at once. Six or seven box plots side by side starts to look cluttered. Group sensibly or use small multiples.
Scatter Plots
Scatter plots show the relationship between two continuous variables. Each point represents one observation, with its position determined by its values on both variables.
What scatter plots reveal
- The direction of the relationship. Positive (points trend upward from left to right), negative (downward), or none (a cloud with no clear pattern).
- The strength of the relationship. Tightly clustered points along a line suggest a strong relationship. Widely scattered points suggest a weak one.
- The shape of the relationship. Linear (a straight line), curved, or something more complex.
- Outliers and influential points. Single points that don't fit the overall pattern.
- Heteroscedasticity. When the spread of one variable changes systematically with the other. This violates assumptions of regression.
When to use a scatter plot
Use a scatter plot any time you want to examine the relationship between two continuous variables. Before running a correlation. Before running a regression. The scatter plot tells you whether the relationship is linear (the assumption behind both tests) or some other shape that requires different methods.
When not to use a scatter plot
Scatter plots break down when one or both variables are categorical. Plotting hours studied against pass/fail (yes/no) produces two horizontal stripes that don't reveal much. Use a box plot of hours studied by outcome instead.
Scatter plots also struggle with overplotting in large datasets. With 10,000 points, individual observations blur into a dark cloud. Solutions include transparency (alpha blending), hex bins, or 2D density plots.
Common mistakes
- Forcing a line through a curved relationship. If the scatter plot shows a clear curve, don't fit a straight regression line to it.
- Confusing correlation with causation in the plot. A scatter plot shows association. It says nothing about cause and effect.
- Mislabeled axes. Always label what each axis represents, with units.
- Wrong axis scaling. Logarithmic data on a linear scale produces misleading visual patterns. Use log axes when appropriate.
Bar Charts
Bar charts compare categorical data. Each bar represents a category. The height of the bar represents the value (a count, a proportion, or a summary statistic like a mean).
When to use a bar chart
- Counts or frequencies by category. How many participants in each treatment group, how many responses in each survey category.
- Proportions or percentages by category. What percentage of respondents chose each option.
- Comparing means across a small number of groups. Three to five treatment conditions, for instance.
When not to use a bar chart
Don't use bar charts for continuous data. A bar chart of test scores hides the distribution, the spread, and the outliers. Use a histogram or box plot instead.
Don't use bar charts of means without showing variability. A bar showing the mean of one group versus another tells readers nothing about whether the difference is meaningful. At minimum, add error bars showing standard deviation, standard error, or confidence intervals. Better, switch to a box plot that shows the full distribution.
Common mistakes
- Bar charts of continuous data. The single biggest visualization mistake in research papers. Switch to histograms or box plots.
- Missing error bars on group means. A bar of means without error bars is misleading. Readers can't tell if the difference between groups is real or noise.
- Truncated y-axis. Cutting off the y-axis at a value above zero exaggerates differences between bars. Start the y-axis at zero unless you have a specific reason not to.
- Too many categories. A bar chart with 20 categories is hard to read. Consider grouping smaller categories into "other," or switch to a different chart type.
Line Charts
Line charts show how a variable changes over time or across an ordered sequence. The x-axis is usually time or another ordered variable. The y-axis is the value being tracked.
When to use a line chart
- Time series data. Stock prices, monthly sales, longitudinal study measurements at multiple time points.
- Trends across ordered conditions. Performance at different difficulty levels, response time at different speeds.
- Comparing multiple groups over time. Treatment vs. control across pre-test, mid-test, post-test, and follow-up.
When not to use a line chart
Don't use line charts when your x-axis is categorical without an inherent order. Connecting "men" to "women" with a line implies a progression that doesn't exist. Use a bar chart for categorical comparisons.
Common mistakes
- Too many lines on one chart. Five or six lines is usually the practical limit before the chart becomes unreadable.
- Missing labels. Every line needs a clear label, either directly on the chart or in a legend.
- Truncated y-axis. Same problem as bar charts. Truncation exaggerates changes that may be trivial.
The Decision Framework
A simple way to pick the right chart, based on what kind of variables you have.
- One continuous variable. Histogram for showing distribution shape. Box plot for showing spread, skew, and outliers concisely.
- One categorical variable. Bar chart of counts or proportions.
- Two continuous variables. Scatter plot.
- One categorical, one continuous. Box plot of the continuous variable by group. A grouped bar chart of group means with error bars works too, but a box plot shows more.
- Two categorical variables. Grouped bar chart, stacked bar chart, or a heatmap if both variables have many levels.
- Continuous variable over time. Line chart. If you have multiple groups, one line per group.
- Three or more continuous variables. Scatter plot matrix or a correlation heatmap. For deeper analysis, dimension reduction (PCA) and a 2D scatter plot.
Reporting Figures in APA Format
APA 7 has specific conventions for figures in research papers.
Figure numbers. Each figure gets a sequential number (Figure 1, Figure 2). The number appears in bold above the figure title.
Figure titles. Below the figure number, italicized, in title case. Brief but descriptive. Example:
Figure 1
Test Scores by Treatment Group
Notes below the figure. Use a note line to explain abbreviations, sample sizes, error bar definitions, and significance markers. Example:
Note. N = 120. Error bars represent 95% confidence intervals. * p < .05.
Reference figures in the text. When you mention a figure in the body of the paper, capitalize "Figure" and use the number. Example: "As shown in Figure 1, the treatment group..." Don't write "see the figure below" or "see the figure above" because position can change in typesetting.
Axis labels. Always include them. Always include units when relevant.
Common Visualization Mistakes That Get Flagged at Review
- Bar charts for continuous data. The most common mistake. Switch to histograms or box plots.
- Missing axis labels or units. Reviewers will ask. Always label both axes with what they represent and the units of measurement.
- Truncated y-axis without a clear note. Visually misleading. Either start at zero or use a clear break mark.
- Default software output without cleanup. Software defaults (gray backgrounds, gridlines everywhere, oversized legends) look unprofessional in published papers. Clean up the defaults before submission.
- Pie charts for anything. Pie charts make it hard to compare slice sizes accurately. A bar chart with the same data is almost always clearer. Some journals explicitly discourage pie charts.
- Color choices that exclude readers. Red-green color schemes are unreadable for people with the most common form of color blindness. Use color palettes that work in grayscale and for color-blind readers.
- Resolution too low for print. Submit figures at 300 DPI or higher. Screen-resolution images look blurry in print.
Get Your Figures Reviewed
Figure problems are some of the easiest review comments to receive but among the most time-consuming to fix. Restructured charts, redone axes, and color palette changes all add up before resubmission.
Editor World's academic editing services include review of figure presentation and APA compliance. Editors check that figures have proper numbering, descriptive titles, complete labels, and notes that explain the relevant statistics. For researchers preparing journal submissions, this kind of review catches the figure problems that reviewers flag first.
A free sample edit is available from any editor before you commit. Browse editor profiles by subject expertise to find someone whose background matches your field.
Frequently Asked Questions
When should I use a box plot instead of a bar chart?
Use a box plot when your data is continuous. Use a bar chart only when your data is categorical or when you're showing counts and proportions. The most common visualization mistake in research papers is showing means of continuous data as bar charts. A bar chart of means hides the distribution, the spread, and the outliers. A box plot shows all three at once. If you're tempted to show a bar of means with error bars, a box plot almost always tells the story better.
What's the difference between a histogram and a bar chart?
A histogram shows the distribution of a continuous variable. A bar chart shows values across categorical groups. The visual difference is that histogram bars touch each other (because the x-axis is continuous) while bar chart bars have gaps between them (because the categories are distinct). The conceptual difference is that histograms show shape and spread, while bar charts show comparisons between named groups.
How do I read a box plot?
The box itself runs from the 25th to the 75th percentile. The line inside the box is the median. The whiskers extend from the box to the smallest and largest values that aren't outliers. Individual dots beyond the whiskers are outliers, usually defined as values more than 1.5 times the interquartile range past the box. If the median line sits in the middle of the box, your data is roughly symmetric. If it sits closer to one end, your data is skewed in that direction.
Should I ever use a pie chart?
Almost never in academic research. Pie charts make it hard to compare slice sizes accurately. Readers can't tell whether a 23% slice is bigger or smaller than a 26% slice without labels. A bar chart of the same data is faster to read and more accurate. Some journals explicitly discourage pie charts in submitted figures. The only situation where a pie chart might be defensible is when you have exactly two categories that need to sum to 100%, but even then a bar chart usually works better.
What does it mean if my scatter plot looks like a cloud?
A cloud with no clear pattern means the two variables aren't linearly related. They might be independent (no relationship at all), or they might be related in some non-linear way that a scatter plot doesn't reveal directly. Try plotting transformations (log, square root) to see if a curved relationship becomes linear. If the cloud is truly random, don't run a correlation or linear regression and report a "significant" result on the assumption a relationship is hiding somewhere. The plot is telling you something real.
Why should I plot my data before running statistical tests?
Statistical tests assume things about your data. Most assume the data is roughly normal. Some assume equal variances across groups. Some break down when outliers are present. A quick visual check tells you whether your data meets these assumptions before you run a test that might be invalid. The summary statistics alone can hide patterns that completely change which test you should use. A 30-second histogram can save you from publishing a result based on the wrong test.
Page last reviewed: May 2026. Editor World, founded in 2010 by Patti Fisher, PhD, is a professional human-only writing, editing, and proofreading marketplace serving researchers and students worldwide. BBB A+ accredited since 2010 with 5.0/5 Google Reviews and 5.0/5 Facebook Reviews.