Why Even the Best Forecasters Sometimes Miss the Mark

Despite all of the hype following Donald Trump’s presidential campaign launch last year, readers of the data-driven website FiveThirtyEight had reason to believe the businessman’s political popularity was a passing fancy. The site’s founder, Nate Silver, had been consistently saying Trump’s support was overstated.

Silver, who called 49 of 50 states in the 2008 general election, and all 50 in 2012, has become known as the gold standard in political forecasting. In September, he argued Trump had a 5% chance of winning the Republican nomination, likening him to flashes in the pan like Rick Perry, Howard Dean and Rudy Giuliani.

No matter what ends up happening at the convention this summer, Trump has had a much stronger run than Silver had projected. As Silver wrote in a subsequent column, “You have to revisit your assumptions.”

Forecasters are human, and forecasting models, for all their dispassion, are built by people. But specialists in forecasting say that despite their limitations, they remain a strong improvement over pure guesswork, or chance.

“It doesn’t mean [Silver] is not a good forecaster. It just means he’s not perfect,” says Philip E. Tetlock, Wharton management professor and co-author of Superforecasting: The Art and Science of Prediction.

“It doesn’t mean [Silver] is not a good forecaster. It just means he’s not perfect.” –Philip E. Tetlock

In building a forecast, “you try to make it as reliable and systematic as possible, and try to improve the signal-to-noise ratio,” Tetlock says.

When Anything Is Possible

Silver’s forecast didn’t exactly fail, says Barbara Mellers, Wharton marketing professor. It’s just that the low-probability event is now more likely to occur, she notes. Furthermore, Silver went back and corrected his beliefs when things turned out differently than he would have imagined, making for a stronger forecast going forward.

“It’s very hard to show that a forecast is wrong,” Mellers says. “The only way that could occur is if the forecast is 100% or 0%. Anything else is possible.”

As an example, she notes that prediction markets said there was a 75% chance the Supreme Court would overturn the Affordable Care Act. But ultimately, the high court upheld it. And yet, people were writing op-eds saying the prediction markets got it wrong. “You don’t get it wrong with a 75% chance,” she says. “You’re just on the wrong side of maybe.”

A forecast always fails to some degree, because there’s always something that doesn’t happen exactly as predicted, according to Wharton marketing professor J. Scott Armstrong. The question, he says, is: How well does it do versus a different approach?

There have been enormous improvements in our ability to forecast anything, except perhaps financial markets, Armstrong notes. Weather forecasters have improved their methods over time, he says. But often forecasters on public policy issues and in business use their forecasts as a political tool to accomplish an end, and that makes the forecast worse, he points out.

Armstrong is the editor of the book Principles of Forecasting: A Handbook for Researchers and Practitioners. It summarizes knowledge from fields like economics, sociology and psychology, and applies it to areas like finance, personnel, marketing and production.

Armstrong describes how, in the early 1900s, forecasters tried to predict the quality of corn using data like previous years’ yields and the size of the kernels, versus more arbitrary, intuition-based methods.

It was that same reliance on data, rather than more subjective measurements, that inspired Paul Meehl of the University of Minnesota to suggest ways to hire people based on their statistical merits rather than the intangibles that come with an in-person job interview, Armstrong says.

And it was that same approach that Oakland Athletics general manager Billy Beane put toward selecting baseball players. That approach made his ragtag team, selected by data on player performance, remarkably competitive despite the low payroll, Armstrong notes.

Historical Relationships

Part of forecasting is looking for historical relationships, according Bob Hughes, senior research fellow at the American Institute for Economic Research, which works to forecast recessions.

“You look for imbalances, things that aren’t sustainable,” Hughes says.

With the benefit of hindsight, the Great Recession could have been forecasted by the steep rise in mortgage lending, the big increase in home prices and increased lending to high-risk customers, Hughes says.

“Supply was being ramped up to meet demand that was unsustainable, by lending that never should have taken place,” Hughes says. That should have been a red flag, he adds.

“It’s very hard to show that a forecast is wrong. The only way that could occur is if the forecast is 100% or 0%. Anything else is possible.” –Barbara Mellers

Another example was the NASDAQ stock bubble of about 15 years ago. Tech stocks went to extraordinarily high levels, as investors thought every startup would be the next Microsoft. That couldn’t last, Hughes says.

It’s somewhat easier to forecast a trend than it is tomorrow’s stock price or this quarter’s gross domestic product, but even that is very challenging, Hughes adds.

Mellers and Tetlock are leaders of the Good Judgement Project, a forecasting tournament sponsored by the U.S. government’s Intelligence Advanced Research Projects Activity. Forecasting tournaments have included tens of thousands of forecasters. They have attempted to predict roughly 500 questions from the intelligence community, including whether treaties would be signed, or whether Greece would exit the eurozone.

The professors found that some people were better than others at forecasting the correct answers to these questions. The best were dubbed “superforecasters,” hence the name of Tetlock’s book.

Mellers pointed to one question that shone a light on just how difficult it can be to forecast the future. That question was: Will there be a violent confrontation between China and a neighbor in the South China Sea?

Things were going along peacefully until, just before the end of the forecasting period, there was an incident where a South Korean ship apprehended a Chinese fisherman in South Korean waters, and when they tried to arrest him, the fisherman took a piece of glass and fatally stabbed the coast guard officer.

The forecast for a fatality occurring was close to zero, Mellers says. One happened, but did it really reflect increased Chinese aggression? The fact that there was a fatality, even if it did not reflect the intent behind the question, “just reflected the fact that life is very difficult to predict,” she says.

“The world is full of uncertain events, and nobody can predict things perfectly,” Mellers says. “So we make the best estimates we can of likely outcomes, but we’re not always right. We sometimes have bad luck, even though we try our best to make good decisions. Bad things occur; we get bad breaks. Predictions are critical to good business. But they’re not always right.”

Multiple Methods

Eric Bradlow, Wharton professor of marketing, says forecasts are predictions, and any single forecast may have a positive or negative error. The best way to do them is to use multiple methods and average them, he notes.

Looking at multiple presidential race surveys provides a sense of the uncertainty involved in this enterprise, Bradlow says.

“If you average them, you average out the errors and are more likely to get an unbiased estimate of the thing you are trying to measure,” Bradlow says. But he notes that some forecasts can actually sway the outcome of an election if people believe their votes are worthless.

“If you average them, you average out the errors and are more likely to get an unbiased estimate of the thing you are trying to measure.” –Eric Bradlow

Armstrong has put the combining forecasts theory to work to try and predict presidential elections. He is one of the founders of PollyVote, the evidence-based forecast that began in 2004. It combined various forecasting methods to develop a projection of how the Democratic and Republican candidates would fare in the general election each year.

In 2004, PollyVote predicted George W. Bush would receive 51.5% of the popular vote; he received 51.2%. PollyVote predicted Barack Obama would receive 53.9% in 2008; he received 53.7%. And it predicted Obama would receive 51.3% in 2012; he received 52%. (As of mid-April, PollyVote was predicting the Democratic candidate would win 53.5% this November, compared to the Republican, who would win 46.5%.)

PollyVote relies on a compilation of forecasts across various methods, including polls, prediction markets, expert predictions, citizen forecasts (what ordinary Americans predict what will happen, based on conversations with other people in their lives), econometric models (economic conditions and public opinion regarding the incumbent party), and index models.

Silver, Armstrong says, has been using judgement, and “that’s going to hurt your forecast. Just go with the data.”

In a November piece, Silver said Trump’s support was “something like 6% to 8% of the electorate overall, or about the same share of people who think the Apollo moon landings were faked.”

Even the very best political forecasters, when they say something will only happen 5% of the time, “and it’s perfectly calibrated, you can expect that thing to happen 5% of the time,” Tetlock said. “When you give yourself 5%, you’re giving yourself some wiggle room. Nate Silver would agree with you [that] he underestimated Trump.”

But will Donald Trump actually win the Republican nomination? Despite his mighty run so far, the race isn’t over yet. Maybe Nate Silver had it right after all.

More From Knowledge at Wharton

How Social Insurance Drives Credit Card Debt

How Financial Literacy Helps Underserved Students | David Musto

Cass Sunstein on Nudging, Sludge, and the Power of ‘Dishabituation’

Looking for more insights?