If you’ve ever seen an ad for an investment product, you’ve heard the phrase “past performance does not guarantee future results.” Because of course, it doesn’t. But that doesn’t stop businesses, governments, organizations and individuals from trying to predict the future. In an effort to improve our odds, a government research agency — the Intelligence Advanced Research Projects Activity (IARPA)– has been sponsoring annual competitions among university teams to devise better ways to measure and improve the art of forecasting world events.
Wharton marketing professor Barbara Mellers has been leading one of those teams — one that won the competition three years straight.
In this interview with Knowledge@Wharton, Mellers discusses her team’s findings, what makes some people better prognosticators, and how the best forecasters can be given a boost.
An edited transcript of the conversation follows.
Recruiting the intelligentsia:
I’ve been involved in a multiyear forecasting tournament sponsored by IARPA, which is the research branch of the intelligence community. IARPA sponsored five different university teams that competed with each other to come up with the best possible ways to measure aggregate forecasts about events all over the world. They include military conflicts, elections, pandemics, refugee flows and even things like the price of commodities.
“In the case of the U.S. government, policymakers face decisions that involve billions of dollars and thousands of lives. And in these cases, the stakes are so high that even a tiny little edge is huge progress.”
Now, what we did was to recruit thousands of forecasters from blogs and professional societies and research centers and so forth, and had them make forecasts over a period of a year. And they were given questions every two weeks. They logged on to a website, where they made their predictions, and they went back to update their forecasts as often as they wanted.
Now, we didn’t really know what to do to improve forecasts, so we did what came naturally — and that was to run experiments. We found that three factors did extremely well. One was training people. We devised a one-hour probability-training module, and that seemed to improve predictions. We put people in teams, as opposed to having them work individually and that improved predictions. The interaction, the information sharing, the debates about rationales boosted accuracy.
And lastly, we found that tracking was a huge booster of forecasting accuracy. At the end of each year, we took the top 2% of [those] thousands of forecasters, put them together in elite groups, and gave them the title of Super Forecasters. And these people increased their accuracy in more ways than we could possibly have imagined. They interacted more. They looked for more information. The net result was amazing. In fact, they helped us win the tournament three years in a row.
Making better predictions:
Everyone relies on predictions, and we know a lot more about predictions than we did prior to this tournament. In the world of business, people care about whether to invest in research or expand to a new market, and they care about what consumers will want, what extensions they’ll prefer, when products will be ready for distribution.
And businesses can, I think, use the insights from this tournament to make better predictions. Not everything necessarily generalizes smoothly, but there are a lot of both psychological and statistical insights that we know now make predictions better.
I was surprised by an awful lot of things in this tournament. I was surprised that our training module worked. It’s tough to design a module that has any effect on judgmental accuracy.
I was very surprised that teams worked better than independent forecasters. I read “The Wisdom of the Crowds” and I assumed that independent forecasters would do better and errors would average out. But the benefits of sharing information and talking about rationales outweighed the benefits of independence. So, in our case, teams worked better.
And I was super surprised about the effects of Super Forecasters. It’s like tracking kids in schools and putting the best ones together. The synergy that came from that was phenomenal.
I think there are several reasons why our teams worked so well. One of them was that they worked online. So, it’s tough to be a dominant bully online. People logged on whenever they felt like it. And so, things were sequential. They weren’t simultaneous, where I think groupthink is more likely to occur. They had a lot of respect for each other, the forecasters, and I think that’s another big part of a good recipe.
“We have pundits and experts and gurus and specialists making predictions about what will happen in the future. And many of those predictions are so vaguely stated that we could never in a million years figure out how to test them.”
The value of a good prediction:
I think companies can improve their predictions. I think they can create their own set of super forecasters, and they can do better at forecasting the future Twitter . And better is really relative. I mean, all you have to do is be better than the next guy.
We’re not looking for perfection here and obviously we’re not going to get it. In the case of the U.S. government, policymakers face decisions that involve billions of dollars and thousands of lives. And in these cases, the stakes are so high that even a tiny little edge is huge progress.
One of the implications of our work, I think, is that good predictions involve both psychology and statistics. It’s a combination of understanding the person and then understanding the aspects of statistical distributions and statistical information. We know now how to do much better at devising algorithms that aggregate multiple forecasts. And we also know a lot about what conditions or environments bring out the best of individual forecasters.
On putting predictions to the test:
Every day we hear predictions. We have pundits and experts and gurus and specialists making predictions about what will happen in the future. And many of those predictions are so vaguely stated that we could never in a million years figure out how to test them. They’re statements like, “There may be an increase in conflict in Yemen in the next two weeks.” Or, “The situation in Baghdad will get worse before it gets better.” Or something like that — that’s not a question or a statement that passes the clairvoyance test. You can’t figure out later who’s right and who’s wrong.
Now, we look to these people for advice. We look to them for insights. And we’re getting really very little from those vague predictions. And I think the way we can do better is to keep score. We’ve got to have predictions stated in such a way that they’re testable. And then we can find out who’s right and who’s wrong, and how we can learn to do better.
“We know a bit more about individual differences that correlate with forecasting accuracy. So we can say a little bit more about who the best forecasters are.”
We are in a fortunate position of being able to have an empirical basis for the claims we make. We know what works based on data, based on experiments. And many of the books and methods and techniques for doing better forecasting are simply not tested. So, clearly the empirical side of things is something that is unique to our project I think.
We’ve learned a lot about what makes things better. We’ve learned, for example, that survey formats with statistical algorithms combining the forecasts can out-perform prediction markets. We’ve learned that if you measure probabilities while people are trading in the market, [and] you also ask them “What’s your probability that event X will occur?” you can do better at forecasting accuracy by combining both the prices and the probabilities. And these things are both surprising from an economic perspective.
We know a bit more about individual differences that correlate with forecasting accuracy. So we can say a little bit more about who the best forecasters are, especially in this geopolitical context.
Not surprisingly, they tend to be smart, they tend to know a lot, they have a lot of political knowledge. But they are also more likely to be actively open-minded thinkers. They are more analytical. They are more likely to take a scientific worldview. They’re more likely to take multiple perspectives on a question, and use multiple reference classes. They’re more likely to use probabilities in a more granular or nuanced fashion. They’re more likely to say 17% and 83% rather than 20% and 80%.
And it turns out that extra granularity has information in it. If you round forecasts up to the nearest 10%, 20%, 30%, 40%, most forecasters do worse, and that suggests there are valuable signals in that granularity, not just noise.
The IARPA tournament will close on June 2, in another month. But that’s not the end of [our] forecasting. The Good Judgment Project will be opening a public tournament this fall. And we have lots of hypotheses to test. We’ll need lots of volunteers. And we’d love to have people affiliated with Wharton.
So, if this is something that interests you, or you know people who would be interested, go to goodjudgmentproject.com to get more information. And we’d be extremely grateful and delighted to have you join us.