It sounds like a simple, straightforward proposition: Scientists should disclose how they collect and analyze the data supporting their scientific publications.

Yet as Wharton operations and information management professors Joseph Simmons and Uri Simonsohn and UC Berkeley colleague Leif Nelson point out in a recent research paper, too much emphasis is placed on getting research results published in respectable journals, without worrying enough about whether the evidence backs up those findings. Indeed, the authors write, “it is unacceptably easy to publish ‘statistically significant’ evidence consistent with any hypothesis.”

Their stance is hardly new. Not just academics, but the public at large, have often looked skeptically at published studies that in some cases defy common sense. One problem with this, Simonsohn says, is that it ends up calling into question even solid research that can lead to new insights about everything from investment behavior to product marketing to consumer psychology.

Because it is so easy to find evidence for any hypothesis, and because counterintuitive findings are more likely to get noticed and praised, “the temptation is to [conduct research] that in the end doesn’t contribute to society very much,” Simonsohn notes. “Instead of asking questions that will lead to important findings in our respective areas, too often we ask questions that are more likely to get media attention. We are missing out on important truths about the world.”

For example, he says, “what if there is a good way to influence savings rates or discover basic realities about how we form judgments? As it is now, we are not providing the right incentive to scientists” to pursue such lines of inquiry.

It gets down to a question of methodology, as Simmons and his colleagues point out in their paper titled, “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.”

‘The Most Costly Error’

The three researchers suggest that “the most costly error” in the scientific process — which includes generating hypotheses, collecting data and examining whether or not the data are consistent with those hypotheses — is a false positive, an effect for which statistically significant evidence is obtained despite it not being real. False positives, the authors note, are persistent, they waste resources “by inspiring investment in fruitless research programs” and they can eventually create credibility problems in any field known for publishing them.

False positives will necessarily happen sometimes, but they occur too often because researchers have many decisions to make during the course of collecting and analyzing data. Furthermore, the authors note, it is “common practice for researchers to search for a combination of analytic alternatives that yields ‘statistical significance,’ and then to report only what worked. The problem, of course, is that the likelihood of at least one (of many) analyses producing a falsely positive finding” is high. Indeed, a researcher is often “more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not.”

Add to that a researcher’s desire to find a statistically significant result — the less intuitive the better. Or, as the authors write, “a large literature documents that people are self-serving in their interpretation of ambiguous information and remarkably adept at reaching justifiable conclusions that mesh with their desires” — and that get their work published.

Simonsohn gives a hypothetical example. Suppose a researcher is trying to help marketers figure out what would make a television or video ad appealing to young people. Say one possibility is to set it to the beat of a popular song. At that point, because we already know that this is a good idea, the researcher would merely study whether it is worth paying for the rights to that music based on anticipated revenues from the ad. “But that’s boring,” says Simonsohn. “Compare it to putting subliminal diagonal yellow lines on the ad as a way to increase sales. If you have two papers on this, the one that estimates the exact value of paying for the rights will not get published because it’s not particularly interesting. The other one will, because it is less intuitive.” Asking less obvious questions makes sense because that is where information has the most value, Simonsohn notes, “but as soon as having correct information is no longer a requirement for studies to work and get published, then asking less obvious questions will tend to lead to less truthful findings. You, as a researcher, will go for the yellow lines. It distracts us from the questions that will lead to more substantive, verifiable findings.”

While Simonsohn acknowledges that academics have long been concerned about the liberties that some researchers take when analyzing data, he highlights three unique contributions that his paper makes. First, he and his co-authors offer a simple, low-cost solution to the problem that does not interfere with the work of scientists already doing everything right — asking them to disclose what they did in ways that would add only a few dozen words to most articles. Second, they demonstrate just how big the problem can get: While up to now, it was suspected that the consequences of taking these liberties were relatively minor, the authors show it can increase the odds of finding evidence for a false hypothesis to over 50%. And third, the authors did an actual experiment to illustrate their point about how data are manipulated to achieve a desired outcome. 

In that demonstration, they “showed” that listening to the Beatles song “When I’m 64” made people younger — not just feel younger, but be younger (by more than a year). Simonsohn and his colleagues used real participants and accepted statistical analyses to reach conclusions that are nevertheless obviously false. They did this by conducting analyses as the data was coming in — and stopping as soon as they got the result they wanted. They also studied the reverse effect on people listening to the children’s tune “Hot Potato,” but did not report their results because the prediction did not pan out for that condition. Both of these procedures — monitoring data and dropping conditions — are not only accepted by journals, but are often required by them in order to get authors to make their studies “simpler.”

Such flexibility can lead to a high number of false-positives, Simonsohn notes. It can also lead to cynicism when people read the conclusion of a scientific study “that just doesn’t seem possible,” he adds. “But it’s important to remember that some of the major theories we subscribe to at first seemed like lunacy…. It would be unfortunate if the general public started to discard scientific conclusions because the standards surrounding research have gotten so low.”

Setting Minimum Requirements

As a solution to what the authors call “the flexibility-ambiguity problem,” they offer six requirements for authors and four guidelines for scientific journal reviewers that they contend would lead to more informed decisions regarding the credibility of researchers’ findings. Among the most important recommendations for authors is that they report how they selected their sample size, and that they report all experimental conditions, including failed manipulations, as a way to prevent the researchers from choosing only those results that are consistent with their hypothesis.

Among the most important guidelines for reviewers is one that requires authors to “demonstrate that their results do not hinge on arbitrary analytic decisions,” and another stating that if “justifications of data collection or analysis are not compelling, then reviewers should require the authors to conduct an exact replication.” So a reviewer “could say to an author, ‘I like your paper. Run study #4 again in the same way with twice as many people, and if it works, we will publish your paper,'” says Simonsohn. “But we never see that. We should.”

The authors suggest that the disclosure requirements noted above “impose minimal costs on authors, readers and reviewers…. We should embrace these disclosure requirements as if the credibility of our profession depended on them. Because it does.”

Simonsohn is quick to note that the problems they are describing do not stem from “malicious intent,” but rather from ambiguity built into the collection and analysis of data, and from “researchers’ desire to find a statistically significant result.”

While many of Simonsohn’s colleagues favor the key recommendations noted above for journal editors, critics of the paper have tended to fall into three main categories. “Nobody is against [our research],” he notes. “Most people are at least mildly in favor of it. But some people also say, ‘These are good points; let’s take more time to think about them,’ which is a typical response to reform of any kind.” Another reaction is for people to “agree with us, but suggest that it is not the job of a journal to enforce standards on research. ‘We are scientists, and everybody should report what they want,’ they say.” The third criticism is that “we are attacking psychologists without evidence to back up our complaints — all of which will diminish the impact of our profession and its ability to affect policy and get funding.”

Simonsohn doesn’t agree. “Our whole reason to exist as scientists is that we are supposed to be better equipped at finding out what’s true. If our methodology takes away that advantage, we might as well just ask for researchers’ opinions instead of their statistical analyses. Evidence obtained with boundless flexibility in analysis and data collection is just as likely to turn out to be as correct as a mere hunch — and perhaps less, if we only publish results from counterintuitive studies. I can’t imagine a more important issue to be addressing.”