Using Correlation Coefficients in Research Papers

Correlational research is one of the most accessible and practical research methods available to academics, especially students working with limited time and budgets. Understanding how to gather, calculate, and report correlation coefficients correctly is an essential skill for writing credible, well-structured research papers across finance, science, engineering, and the social sciences.


This article explains what correlation coefficients are, when to use correlational research, which type of correlation coefficient fits your data, how to analyze and present your findings accurately, and how to report results in the conventions that journal editors and reviewers expect. It also walks through worked examples from each major research discipline.


Correlation Coefficient Formula

What Is a Correlation Coefficient?

A correlation measures the linear relationship between two variables. The correlation coefficient describes both the strength and the direction of that relationship, expressed as a value ranging from -1 to +1. A value of 0 indicates no linear relationship between the variables, while a value of -1 or +1 indicates a perfect relationship. Values closer to 0 indicate a weaker relationship, and values closer to -1 or +1 indicate a stronger one.


The purpose of correlational research is to determine whether a linear relationship exists between two variables. For example, you might want to know whether the number of hours a student studies correlates with the grade they receive. By surveying students in a class, you could collect data on weekly study hours and final grades, then apply a correlation coefficient formula to get a value between -1 and +1. That value would tell you whether grades tend to increase as study hours increase, decrease as study hours increase, or show no discernible pattern based on hours studied.


It's important to understand that correlation isn't the same as causation. Even if correlational research shows that grades improve as students study more, you can't conclude that studying more directly causes better grades. There are often other explanations for the results you find, which we'll cover in the analysis section below.


Why is correlational research useful?

If correlational research can't establish causation, you might wonder why it's worth doing at all. Here are the main reasons researchers choose correlational methods:


  • It's faster than experimental research. You can gather data from natural, non-experimental settings in a relatively short time, without needing to design and run a controlled experiment.
  • It's more affordable. Correlational research typically requires fewer resources than experimental research, making it a practical option for students and researchers working with limited budgets.
  • It's the only option in many real-world contexts. Many research questions in finance, epidemiology, and the social sciences can't be tested experimentally for ethical or practical reasons. You can't randomly assign people to different income levels, smoking habits, or stock portfolios. Correlational research lets you investigate these questions using observational data.

When should you use correlational research?

There are several situations where correlational research isn't just useful but the most appropriate choice:


  • Investigating non-causal relationships. You won't always expect a causal relationship between two variables, but knowing whether they correlate can still be valuable for building a broader understanding of a topic.
  • Supporting causal theories. When it's too expensive, impractical, or unethical to run experiments that would establish causation, a strong correlational finding can lend support to a causal theory.
  • Testing new measurement tools. If the correlational relationship between two variables is already well established, you can use correlational research on those variables with new measurement instruments to assess their validity and reliability.
  • Identifying predictors before designing an experiment. Correlational analysis at an early stage of a research program helps narrow the field of candidate variables before committing resources to a controlled experimental design.

Types of Correlation Coefficients

The right correlation coefficient depends on the type of data you've collected and whether it meets certain statistical criteria. Each coefficient has a specific formula and is suited to a specific kind of dataset. Choosing the wrong coefficient is one of the most common reasons reviewers flag a methods section, so the choice deserves careful attention.


Which correlation coefficient should you use?

  • Pearson's r. Used for the relationship between two continuous, randomly distributed variables that are both normally distributed. Your data must meet these criteria for Pearson's r to be an accurate measure. Pearson's r is the default choice in most quantitative finance, science, and engineering research where the variables are interval or ratio measurements.
  • Spearman's rho. Used for two continuous or ordinal variables that don't need to be normally distributed. It's the most common alternative when your data doesn't meet the criteria for Pearson's r. Spearman's rho is based on the ranked order of data rather than the actual values, making it the standard choice for survey data with Likert-scale responses, ordinal rankings, or skewed distributions common in the social sciences.
  • Kendall's tau. An extension of Spearman's rho, used when working with a small dataset where one rank appears too many times. Kendall's tau is also preferred when the dataset contains many tied ranks, as Spearman's rho can give misleading values in those situations.
  • Phi coefficient. Measures the strength of the relationship between two categorical variables in a 2x2 contingency table. The phi coefficient is common in epidemiology and clinical research where both variables are dichotomous (e.g., exposure yes or no, outcome yes or no).
  • Cramer's V. Measures the strength of the relationship between two categorical variables in contingency tables larger than 2x2. Cramer's V is used in marketing research, political polling, and any application where the variables have multiple discrete categories rather than just two.
  • Point-biserial correlation. A specific case of Pearson's r used when one variable is continuous and the other is dichotomous. The point-biserial correlation is common in psychology and educational research, for example when correlating test scores (continuous) with pass/fail outcomes (dichotomous).

How to Collect Correlational Data

Like experimental research, correlational research uses quantitative methods. The key difference is that variables in correlational research are observed rather than manipulated. There are three main approaches to collecting correlational data:


  1. Surveys. Questionnaires let you collect data quickly from your target population. They can be administered in person, online, by mail, or by phone, making them one of the most flexible data collection methods available. Survey-based correlational research is dominant in the social sciences and consumer research.
  2. Observation. This approach involves recording behavior or phenomena as they occur in a natural environment, including descriptions of the setting, events, and actions being observed. Observational data collection is common in epidemiology, ecology, and behavioral research.
  3. Secondary sources. Existing datasets collected for other purposes can be used for correlational research. This is the fastest and least expensive approach, but it comes with a tradeoff: since you didn't collect the data yourself, you have no control over its reliability or validity. Secondary data sources include CRSP and Compustat in finance, the Survey of Consumer Finances and the Panel Study of Income Dynamics in economics, the Materials Project database in engineering, and ICPSR-archived datasets in the social sciences.

How to Analyze and Interpret Correlation Coefficients

Analyzing correlational data begins with plotting your data and calculating the correlation coefficient. The coefficient gives you a value representing the strength and direction of the relationship, while graphing the data gives you a visual picture of what that relationship actually looks like. Always plot your data before running the calculation. A scatter plot reveals nonlinear relationships, outliers, and clusters that the correlation coefficient alone will hide.


The table below provides general guidelines for interpreting the strength of a correlation coefficient:


The Absolute Value of the Correlation Coefficient

Correlation Coefficient Interpretation

0.00 to 0.10

Negligible

0.10 to 0.39

Weak

0.40 to 0.69

Moderate

0.70 to 0.89

Strong

0.90 to 1.00

Very Strong


These thresholds are general guidelines, not universal standards. In some fields, what counts as a meaningful correlation differs. In physics and chemistry, where measurement precision is high and underlying relationships are often deterministic, researchers expect very strong correlations and treat anything below 0.90 with skepticism. In psychology and the social sciences, where human behavior is influenced by many factors simultaneously, a correlation of 0.30 may represent an important and publishable finding. Calibrate your interpretation to the conventions of your field and target journal.


There are two important caveats to keep in mind when interpreting your results.


First, a value near 0 doesn't mean there's no relationship between the variables at all. It means there's no linear relationship. There could still be another type of relationship, such as a quadratic one. Graphing your data before running the analysis will help you spot any nonlinear patterns. The classic illustration is Anscombe's quartet, four datasets with identical Pearson correlations of approximately 0.816 but radically different visual structures including curved relationships, single outlier-driven correlations, and clustered patterns. Anscombe's quartet is a reminder that the correlation coefficient is a summary, not a substitute for visual inspection.


Second, correlation isn't causation. When you find a correlational relationship, there are often multiple explanations for it that weren't accounted for in your research.


One common issue is the directionality problem. Using the studying and grades example: if students who study more get better grades, you could equally argue that getting better grades motivates students to study more. The data alone can't tell you which direction the relationship runs.


Another issue is the possibility of a third variable. It's possible that a separate factor influences both variables simultaneously. In the studying and grades example, students who sleep more might both study more and earn better grades, meaning that sleep, not studying, is the underlying driver of both outcomes. Third-variable problems (also called confounding) are the reason correlational research can rarely substitute for randomized experimental designs when causal claims are the goal.


Worked Examples Across Disciplines

The application of correlation coefficients differs significantly across research disciplines. The examples below illustrate how researchers in finance, science, engineering, and the social sciences select, calculate, and interpret correlation coefficients in their respective fields.


Finance: stock returns and the Fama-French factors

In empirical asset pricing, correlation coefficients underpin the relationship between individual stock returns and systematic risk factors. Following the framework introduced by Eugene Fama and Kenneth French, researchers correlate excess stock returns with three factors: the market premium, the size factor (SMB, small minus big), and the value factor (HML, high minus low book-to-market). These correlations are typically calculated using Pearson's r on monthly return data from CRSP, with sample sizes ranging from 60 monthly observations for individual stocks to several thousand for fund-level analyses.


A study correlating monthly returns of a value-tilted equity fund with the HML factor over a 20-year period might report a Pearson correlation of 0.72, indicating a strong positive relationship and supporting the fund's value style classification. A correlation of 0.15 would suggest the fund's name and stated strategy don't match its actual factor exposure, a finding with material implications for institutional investors. Finance journals such as the Journal of Finance, the Journal of Financial Economics, and the Review of Financial Studies expect correlation reporting to include the full variance-covariance matrix among factors and the time-series consistency of the correlation across market regimes.


Science: dose-response relationships in clinical research

In clinical and pharmacological research, correlation coefficients quantify the relationship between drug exposure and physiological response. A study evaluating a new antihypertensive medication might correlate plasma drug concentration with reduction in systolic blood pressure across 200 patients. Because both variables are continuous and approximately normally distributed at therapeutic doses, Pearson's r is the appropriate choice.


A reported Pearson r of 0.65 with a 95 percent confidence interval of 0.55 to 0.73 and p less than 0.001 would indicate a moderate positive relationship between drug concentration and blood pressure reduction. The clinical significance of this correlation depends on context. In hypertension research, a correlation of 0.65 is meaningful because blood pressure is influenced by many factors beyond a single drug, including diet, sodium intake, sleep, and stress. In contrast, a study of a chemical reaction rate's correlation with temperature would be expected to produce correlations above 0.95, because the underlying physics is deterministic. The New England Journal of Medicine, the Lancet, and JAMA expect dose-response correlations to be accompanied by sample size, confidence intervals, p-values, and discussion of potential confounders such as renal function, age, and concomitant medications.


Engineering: materials properties and processing parameters

In materials science and engineering, correlation coefficients describe the relationship between processing parameters and resulting material properties. A study optimizing the strength of a 3D-printed titanium alloy component might correlate laser power (continuous, watts) with ultimate tensile strength (continuous, megapascals) across 50 specimens manufactured at varying laser power settings. Because the underlying physical relationship is deterministic but subject to manufacturing variability, Pearson's r is appropriate and high correlations are expected.


A study reporting a Pearson r of 0.91 between laser power and tensile strength would indicate a very strong relationship, with laser power explaining roughly 83 percent of the variance in tensile strength (the coefficient of determination, r squared, equals 0.83). Engineering journals such as Materials Science and Engineering A, the Journal of Materials Processing Technology, and Acta Materialia expect correlation findings in materials research to be accompanied by mechanistic explanations grounded in the physics of the process. A correlation without a mechanism is treated as preliminary. Correlations are also commonly reported alongside response surface models, design of experiments analyses, and finite element simulation comparisons that integrate the correlational finding into a broader analytical framework.


Social sciences: financial behavior and demographic variables

In behavioral economics and consumer research, correlation coefficients describe relationships between financial behaviors and demographic, psychological, or attitudinal variables. Fisher and Yao (2017), in their study of gender differences in financial risk tolerance, used the Survey of Consumer Finances to examine correlations between risk tolerance and a range of variables including income, education, age, and household composition. The use of Spearman's rho is common in such research because Likert-scale risk tolerance measures are ordinal rather than continuous, and income distributions are typically skewed rather than normal.


A study correlating self-reported financial risk tolerance (5-point ordinal scale) with household income (continuous, log-transformed) across 6,500 households might report a Spearman rho of 0.28 with p less than 0.001. The correlation is statistically significant due to the large sample size, but its absolute magnitude is modest. In behavioral and social science research, this is a meaningful and publishable finding because risk tolerance is influenced by many factors beyond income, including personality, life stage, marital status, financial literacy, and cultural background. Journals such as the Journal of Consumer Research, the Journal of Financial Counseling and Planning, and the Journal of Family and Economic Issues expect such correlations to be reported alongside multivariate regression analyses that control for the confounding variables identified in the literature, recognizing that bivariate correlations alone rarely answer social science research questions.


Reporting Correlation Coefficients in Your Research Paper

When you report correlation coefficients in a research paper, include the type of coefficient used, the value, the sample size, and the p-value to indicate statistical significance. Be precise about the direction and strength of the relationship, and always acknowledge the limitations of correlational findings, including the inability to establish causation. The standard reporting format in most quantitative journals follows APA conventions: r(N-2) = value, p = value, with the degrees of freedom in parentheses and the correlation coefficient and p-value to two or three decimal places.


Beyond the basic value and p-value, contemporary journal expectations have evolved. Most quantitative journals now expect:


  • Confidence intervals. Report the 95 percent confidence interval for the correlation coefficient. A correlation of 0.45 with a confidence interval of 0.30 to 0.58 is more informative than a correlation of 0.45 alone, because the confidence interval communicates the precision of the estimate.
  • Effect size interpretation. Don't just state that the correlation is statistically significant. Statistical significance with a large sample size can mean a trivial effect. Effect size interpretation places the correlation in context relative to your field's conventions.
  • Coefficient of determination (r squared). Reporting r squared alongside r helps readers understand how much variance in one variable is explained by the other. An r of 0.50 corresponds to an r squared of 0.25, meaning 25 percent of variance is explained, leaving 75 percent unexplained.
  • Visual presentation. Include a scatter plot showing the relationship visually, especially when the correlation is reported as part of the main findings. Modern journal practice in finance, science, engineering, and the social sciences expects both numerical and visual presentation.
  • Discussion of limitations. Acknowledge the limitations explicitly: the inability to establish causation, the directionality problem, the possibility of confounding variables, and any sample-specific limitations that constrain generalization.

Common Mistakes to Avoid

Reviewers commonly flag the following errors in correlation reporting. Avoiding them improves the credibility of your manuscript and reduces revision burden.


  • Reporting Pearson's r when assumptions aren't met. Pearson's r assumes normally distributed data, linear relationships, and continuous variables. When these assumptions aren't met, Spearman's rho or Kendall's tau is more appropriate. Reporting Pearson's r on skewed or ordinal data is one of the most common methodological errors flagged in peer review.
  • Implying causation in correlational findings. Phrases like "X caused Y," "X led to Y," or "X resulted in Y" are inappropriate when the underlying analysis is correlational. Use language like "X was associated with Y," "X correlated with Y," or "Higher values of X were observed alongside higher values of Y."
  • Ignoring outliers. A single outlier can shift a correlation coefficient substantially, particularly in small samples. Identify outliers visually through scatter plots, decide whether they're errors or legitimate observations, and report your decisions transparently.
  • Cherry-picking correlations. If you calculate dozens of correlations and report only the significant ones, you're inflating the false discovery rate. Apply Bonferroni or Benjamini-Hochberg corrections when running multiple comparisons, and report all correlations or specify clearly which subset you're reporting and why.
  • Conflating statistical significance with practical importance. A correlation of 0.05 with a sample of 10,000 will be statistically significant (p less than 0.001), but it explains only 0.25 percent of variance. Statistical significance alone doesn't make a finding meaningful.
  • Failing to graph the data. A correlation coefficient is a summary statistic. Without a scatter plot, readers can't see whether the relationship is genuinely linear, whether outliers are driving the result, or whether the data clusters in unexpected ways.

Final Thoughts

Correlational research is a genuinely valuable method for academics at every level. It's fast, affordable, and well suited to a wide range of research questions across finance, science, engineering, and the social sciences. With a solid understanding of how to select, calculate, interpret, and report correlation coefficients, you'll be well-equipped to design your own correlational study and report your findings clearly and accurately. The most important skills to develop are matching the correlation coefficient to your data, recognizing the limitations of correlational findings, and reporting results in the conventions your target journal expects.


Editor World's Editing Services for Quantitative Researchers

Editor World's journal article editing service connects researchers with native English editors whose subject matter expertise matches the manuscript. A finance researcher gets an editor with empirical asset pricing experience. A clinical researcher gets an editor with biomedical research experience. A materials scientist gets an editor with engineering manuscript experience. A social science researcher gets an editor with quantitative behavioral or economics research experience. Browse editor profiles by discipline and credentials before submitting.


All editing is returned in Track Changes in Microsoft Word so you can review, accept, or reject each correction individually. American English is applied by default. British English is available on request at no additional charge for documents targeting European journals or other audiences that expect British conventions. A certificate of editing is available as an optional add-on, confirming human-only native English review with no AI tools used at any stage. Many international journals require this certificate for submissions from non-native English authors. Same-day editing is available with 2-hour, 4-hour, and 8-hour turnaround options for urgent journal deadlines, available 24/7 year-round.


For more guidance on conducting research and writing research papers, see our articles on how to prepare a research paper for professional editing and what to do after journal rejection. For our rewriting service when a document needs more than proofreading, or our professional proofreading service for final-stage error checking, our editors are here to help.



Frequently Asked Questions

What is the difference between correlation and causation?

Correlation describes the strength and direction of a linear relationship between two variables. Causation describes a directional mechanism by which one variable produces a change in another. A correlation can exist without causation, and a causal relationship can exist without an obvious correlation if the relationship is nonlinear or moderated by other variables. Correlational research can identify associations between variables but can't establish causation, because the directionality of the relationship is unknown and confounding variables may be driving both observed variables. Establishing causation typically requires a randomized controlled experiment or a causal inference framework such as instrumental variables, regression discontinuity, or difference-in-differences analysis.


When should I use Pearson's r versus Spearman's rho?

Pearson's r is appropriate when both variables are continuous, normally distributed, and the relationship between them is approximately linear. Spearman's rho is appropriate when one or both variables are ordinal, when the data aren't normally distributed, or when the relationship is monotonic but not necessarily linear. Spearman's rho works on ranked data rather than the actual values, making it more robust to outliers and non-normal distributions. For survey research using Likert scales, Spearman's rho is typically the right choice. For continuous physical or financial measurements, Pearson's r is typically the right choice. When in doubt, run both and report the one that matches your data type and your field's conventions.


How do I report a correlation coefficient in APA format?

The standard APA format for reporting a correlation coefficient is r(N-2) = value, p = value, where N-2 is the degrees of freedom (sample size minus two for Pearson's r), the correlation coefficient is reported to two or three decimal places without a leading zero (.45 rather than 0.45), and the p-value is reported to three decimal places or as p < .001 if smaller. For example: r(98) = .45, p < .001. Modern reporting also expects the 95% confidence interval, the effect size interpretation, and the coefficient of determination (r squared) where relevant. Specific journal style guides may differ, so check your target journal's instructions before finalizing your manuscript.


What sample size do I need for correlational research?

Sample size depends on the expected correlation strength, the desired statistical power, and the significance level. For a medium effect size (r approximately 0.30), 80% power, and an alpha of 0.05, a sample size of approximately 84 observations is typically sufficient. For a small effect size (r approximately 0.10), the required sample size grows to approximately 782 observations. For a large effect size (r approximately 0.50), approximately 28 observations are sufficient. Power analysis software such as G*Power, R's pwr package, or commercial alternatives like SAS or SPSS can calculate exact sample size requirements for specific research designs. Journals in finance and the social sciences increasingly expect a priori power calculations to be reported in the methods section.


How do I handle outliers in correlational research?

Identify outliers through visual inspection of scatter plots and statistical methods such as Cook's distance, leverage values, and standardized residuals. Once identified, determine whether each outlier represents a measurement error (in which case it should be corrected or removed) or a legitimate observation that's simply far from the cluster (in which case removal is more controversial). When outliers are legitimate observations, options include reporting the analysis with and without the outliers, using Spearman's rho or Kendall's tau which are less sensitive to outliers, or applying robust correlation methods such as the Winsorized correlation or the percentage-bend correlation. Whatever approach is taken, document and justify the decision transparently in the methods section so reviewers can evaluate the analysis on its merits.


Can correlation coefficients be used for time series data?

Standard Pearson and Spearman correlation coefficients can be calculated on time series data, but they often produce misleading results because of autocorrelation, where each observation depends on previous observations. In financial time series, for example, the apparent correlation between two stock prices may simply reflect that both have generally trended upward over time rather than that they're genuinely related. Specialized methods address this issue. Cross-correlation functions handle autocorrelation explicitly. Cointegration analysis tests whether two non-stationary time series share a common long-run trend. Granger causality tests examine whether one time series helps predict another. For empirical finance research, the Newey-West standard error correction and the Hansen-Hodrick adjustment are commonly applied. Time series correlation analysis requires careful attention to stationarity, structural breaks, and the underlying economic or physical mechanism connecting the series.


What is the coefficient of determination and how does it relate to the correlation coefficient?

The coefficient of determination, denoted r squared, is the square of the correlation coefficient and represents the proportion of variance in one variable that's explained by the other. A correlation of 0.50 corresponds to an r squared of 0.25, meaning 25% of variance is explained, leaving 75% unexplained. A correlation of 0.90 corresponds to an r squared of 0.81, explaining 81% of variance. Reporting r squared alongside r helps readers understand the practical importance of the relationship. A statistically significant correlation of 0.10 corresponds to an r squared of just 0.01, explaining only 1% of variance, which is rarely a finding of practical importance even when the p-value is small. The coefficient of determination is particularly useful in regression contexts where it generalizes to multiple predictors.


How do I write the limitations section for a correlational study?

A limitations section for correlational research should explicitly acknowledge the inability to establish causation, the potential for the directionality problem (whether X causes Y, Y causes X, or both), and the possibility of confounding by third variables that influence both observed variables. Discuss the sample's generalizability: did you study a specific population, time period, or institutional context that might limit how broadly your findings apply? Address measurement limitations: were the variables measured by valid and reliable instruments? Discuss any nonresponse, attrition, or selection bias that might affect the correlations observed. Modern journals expect a substantive limitations section rather than a brief paragraph, often with specific suggestions for follow-up research that could address the limitations. The limitations section is one of the most-read sections of the paper for reviewers and should reflect serious engagement with the constraints of your design.


Content reviewed by Editor World editorial staff. Editor World provides professional English editing and proofreading services for researchers, students, business professionals, and authors worldwide. References include Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17 to 21; Fama, E. F., and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3 to 56; and Fisher, P. J., and Yao, R. (2017). Gender differences in financial risk tolerance. Journal of Economic Psychology, 61, 191 to 202.