How to Get the Best Results from Big Data Analysis

Scott E. Page, professor of complex systems, political science and economics at the University of Michigan, doesn’t want people to limit themselves to linear thinking. In his new book, The Model Thinker: What You Need to Know to Make Data Work for You, he explains how taking a multi-paradigm approach puts more power into solving problems, innovating and understanding the full range of consequences to complex actions. He believes using many models is the best way to make sense out of the reams of data available in today’s digital world. Page recently spoke on the Knowledge at Wharton radio show on Sirius XM about why it’s important to widen your data lens. (Listen to the podcast at the top of this page.)

An edited transcript of the conversation follows.

Knowledge at Wharton: What is multi-model thinking?

Scott Page: We live in this time where there are two fundamental things going on. One is, there’s just a firehose or hairball of data, right? Tons of data out there. At the same time, we have this recognition that the problems and challenges that we confront are complex. And by that, I mean high-dimensional, lots of interdependencies, difficult to understand. So, what do we do? How do we use that data to confront the complexity?

The philosophy I’m putting forward goes as follows: You have to arrange that data on some sort of model. You want to think of a model as Charlie Munger, the famous investor, describes it — a latticework of understanding on which you can array the data.

But models by definition are simple, so there’s a disconnect. I’m trying to understand something complex with something that’s simple. What I’ve bought with that simplicity is logical coherence. But what I’ve lost in that simplicity is any notion of coverage because there’s too much stuff I’ve got to leave out.

Instead, what I propose you do is bring an ensemble of models to bear. This is a thing. People in machine learning have been doing this; all the fancy stuff’s going on in AI. If you really unpack what’s going on in those sophisticated algorithms, they really are ensembles of little algorithms and little rules. The idea is, any one model is going to be wrong, but many models are going to be not only a lot of coverage, but also a collection of coherent understandings of a complex phenomenon.

Knowledge at Wharton: Is this multi-model approach common in the business world?

“You can’t be many smart people. But you can construct a handful of models that will allow you a much richer understanding of anything you want to look at.”

Page: What’s interesting is when you look at people who are really on top of their game. Charlie Munger’s strategy from the beginning has been to use a lattice of models. Andy Lo from MIT, in analyzing the 2008 financial crash, read 13 books on it. He said each one lays out a different story as to why we saw the financial crash, and each one is right in its own way.

If you were trying to figure out something like how do we reduce inequality, and if you read [economist Thomas] Piketty, you’d just get one particular version. You know, R is bigger than G. This is about the wealth accumulating money. But that wouldn’t say anything about assortative mating. It doesn’t explain [Facebook founder Mark] Zuckerberg or [Microsoft founder Bill] Gates. So, you think, “Wait a minute, it doesn’t make sense.”

There are people who are really at the forefront of this, like Regina Dugan, who used to run DARPA (Defense Advanced Research Projects Agency) and then was at Google and Facebook. She promoted this idea of collective intelligence. There’s a whole sort of set of people in this space now. What Regina and other people in the collective intelligence community would say is that collective intelligence comes from ensembles of smart people who know different things. I’m saying, you can’t be many smart people. But you can construct a handful of models that will allow you a much richer understanding of anything you want to look at.

Knowledge at Wharton: How has big data changed the process of using models?

Page: That’s a great question. Let’s take a famous model from epidemiology called the SIR model, which stands for Susceptible Infected Recovered. Somebody catches a disease or there’s some mutation in someone’s flu virus, and suddenly you’re infected. To get infected, that would be a susceptible person; they have to run into an infected person.

Early on, there aren’t many infected people, so the curve starts out really slow. But as there are more and more infected people, then it gets really fast and steeper. Once almost everybody’s been infected and recovered, it flattens out so that you get an S-shaped curve.

Let’s go back 30 years, so we’re safely in the no-data zone. It would be like, “Well, you know, there seems to be an outbreak of flu in this region. Do we vaccinate people?” Now you can get 40 days of data and predict within a range how many people are going to get this disease, and you can intervene.

This summer, two Major League Baseball players got hand, foot and mouth disease, which is something people usually only get in pre-school. Go to the health department in Singapore, which keeps monthly data on exactly how many people have the disease. Whereas here, we had intrepid reporters outside clinics in Philadelphia, saying, “Doctors told me they have seen a rare number of cases.” For all we know, we could have had an epidemic. That disease could have mutated. Now, you can fit it.

One of the premises of the book is that many-model thinking works in two ways. One is, given any problem, you can apply many models. But the other thing is, given a model, like if I learn the SIR model, I can apply it anywhere. Anything that has this kind of viral-like spread, you can fit the model.

“Given any problem, you can apply many models.”

It turns out these physicists took the SIR model and applied it to the spread of [pop singer] Justin Bieber. They showed that “Bieber fever” was actually worse than the mumps. It’s literally one of the most contagious diseases we’ve ever seen, which is just hilarious, right? But on the one hand, it’s not. Because if you’re Bieber’s manager and you fit that SIR model, the curve starts out kind of flat, right? But you know this is a diffusion process. You can anticipate things. You turn down certain opportunities because you know better ones are going to come along, because you just have a better sense of how things are going to go.

Knowledge at Wharton: You say one of the benefits of multi-model thinking is greater wisdom. Can you take us further into that?

Page: There’s a wonderful way to organize how we think about the world called the wisdom hierarchy. Imagine there is all this data that’s just floating out there. What we do to make sense of the world is really bend that data into information. When I say “unemployment,” I’m aggregating something. Any graph you see in any newspaper, or any number of women voters for Trump or something, that’s binning information to categories. And we think of that categorization as information. Models find relationships within those bins. Things like force equals mass times acceleration is a piece of knowledge, right? Things like supply equals demand is a piece of knowledge.

You’ve got data, information and then knowledge. On top of this pyramid is wisdom. Wisdom consists of sometimes knowing which piece of knowledge to apply, and other times combining the knowledge in an interesting way and thinking about how it works.

What I found fun about thinking through that transition from knowledge to wisdom is realizing just how complicated it is. I wrote a paper with Hélène Landemore, who’s a brilliant young political philosopher at Yale. There’s this guy, Jurgen Habermas, a political philosopher who talks about reaching consensus. We realized, let’s suppose that you are a classic Keynesian macroeconomist and you spent 20 years learning sophisticated Keynesian models of the economy that you fit to all this data. And suppose I’m a real business cycle economist, like from the University of Chicago, and I’ve spent 20 years learning that. It’s not like we can get together in a room for 20 minutes and reach some grand consensus on a new model of the economy.

[Landemore] has this notion of what she calls positive dissensus among the models. What we hope is that we can achieve wisdom by me saying, “Actually, you may be making more sense here than I am. Your model fits better. Given the things I leave out of my model and the things that you include in your model, maybe we should put a little more weight in what you’re thinking we should do.” It also depends on what are we using the model for. You could be using these models to just explain data. You could be using them to guide policy. You could be using them to design policy. Depending on what you’re trying to do, you may use the ensemble models in different ways. But if you just stick to one, you’re leaving out a lot of variables, and you’re just really likely to make a mistake.

Knowledge at Wharton: In the book, you bring up the opioid crisis and income inequality, and you talk about the potential of many-model thinking to address those problems. Let’s talk about each of those.

Page: With opioids, one thing you can do is just have a simple, linear model to ask, did these work? That’s what the government does when they do testing. The evidence shows that they really did reduce pain, they really did allow people to get back to work sooner. If you just used that model, you’d say, “Let’s approve these things.”

But another model you can apply to that is something called the Markov model. With a Markov model, you imagine someone in a state. I can be in a state of pain. I can be in a state of addiction. Or I can be in a no pain, everything’s fine state. You can imagine people moving between those three things. But if I’m in the pain state, to get me to the no pain state, you might give me opioids. That would be the intervention.

The danger is that I might move to the addicted state. When you do drug approval, they estimate those models. What’s the likelihood of being addicted? But here’s the tricky thing about one of these Markov models, which is kind of like a systemic model. People are moving between these things, and they’re nonlinear. If I change the probability of being addicted from 1% to 3% percent, that small increase can have the number of addicts move from, say, 2% to 8%. You can get this huge amplification.

When they test, they gave people small amounts of opioids. But once doctors are prescribing these, they were giving people a month’s supply. You give people a month’s supply, and you have just a slight increase in the probability of becoming addicted. There’s some evidence of this. But in rural areas, where somebody has to drive a long way in, the doctor’s probably more likely to give them a longer prescription. Once you’re in that addicted box, once you’re in that state, you can’t get out.

They knew opioids are potentially addictive. But had they used a Markov model more seriously, they would have looked at what they call comparative statics, where you change one variable and ask what the effect is. You’d have recognized the danger of addiction is so high that we should limit prescriptions to five days right out of the box. But they didn’t, partly because they never did tests where they gave people long numbers of days taking opioids, because they didn’t want to create addicts.

Knowledge at Wharton: What about income inequality?

“You’ve got data, information and then knowledge. On top of this pyramid is wisdom.”

Page: I had so much fun writing that portion of the book. There’s a part of me that thought I probably should have written an entire book just on the many lenses of income inequality. But let’s throw out just a couple.

Piketty’s model essentially says returns to capital are larger than the growth rate in the economy. One of the models I describe in the book is called the Rule of 72, which says that if you take your interest rate and divide it into 72, that’s how long it takes for your money to double. If the rich are getting 6% on their money, every 12 years it doubles. So, in 48 years, it’s going to double four times, which is a 16-fold increase.

If the economy’s only growing at, let’s say, 3%, then it’s only going to double twice, which is going to be a fourfold increase. As long as the rich don’t spend a ton of their money, they’re just going to have a lot more of it every generation. He shows that’s true over hundreds and hundreds of years. That’s one model. And it doesn’t explain Zuckerberg or Justin Bieber or Beyoncé or Oprah or anybody else.

There are other models. One of them is the superstar model, which shows that, because of changes in technology, we can all see the best performer. We can all watch LeBron James on TV. We can all see movies that everybody sees, as opposed to having to go to plays. There’s wonderful work by Duncan Watts and Matt Salganik, where they did this music lab experiment where undergrads could download any songs they want, and they didn’t see what songs other people downloaded. You got a pretty nice, even distribution. But once people can see what songs other people download, you get this really distorted distribution where there’s some huge winners. There’s huge social influence, and you get this huge superstar effect. You get these winner-take-all economies. So, that’s another answer.

But then you can look at things like sociological models based on what’s called assortative mating. This model is so simple, but ends up having huge power. If you have two people who get married, the family income equals the income of the spouses added together. What’s frightening, though, is this has enormous econometric power in explaining the changes in income distribution. People now marry a little bit later. In the past, not as many women worked. You’ve got high-income people marrying high-income people, and if you look at the correlation in highest degree earned, that’s gone way up. In particular, women who have graduate degrees tend not to marry men who don’t have graduate degrees. That just creates a huge amplification.

Imagine I just create a population of people, give them all incomes, and then I have the highest-income people marry each other. Well, that’s going to be a difficult one to get the superstar effect. The Piketty thing is like, “This is just a fundamental part of economics, and maybe we should change our policy on estate taxes.” But then this last one is just a sociological phenomenon. And that could change in the sense that couples could decide, “Why doesn’t one of us become an artist or volunteer.” There’s a possibility that our narrative could change as a culture.

People sit around and say, “We need to eradicate inequality.” If you look at those three reasons as to why it’s happening, the only one that naturally lends itself to a policy solution is Piketty’s argument. It’s not that we want to have a horse race between these three models and say assortative mating wins. But what we want to do is say, “Wow, there’s a lot of stuff going on here.” Here’s where I think the beauty of the whole many model approach is: Every one of these models is pretty simple.

Knowledge at Wharton Podcast

How to Get the Best Results from Big Data Analysis

February 20, 2019 • 22 min listen

More From Knowledge at Wharton

How Advertising Could Reshape ChatGPT and Digital Marketing

Can Classroom Cell Phone Bans Boost Grades?

Psychology of Love and Relationships in the Digital Age | Pinar Yildirim

Looking for more insights?