Confidence Games: Why People Don’t Trust Machines to Be Right

Facts vs. intuition. Man vs. machine. Algorithms vs. emotions. When we’re given the choice of trusting another person’s conclusions, or our own guesses, or accepting facts as based on algorithmically analyzed data, most of us tend to trust the human more. But that’s not always the best choice.

In a recent interview on the Knowledge at Wharton show on Wharton Business Radio on SiriusXM channel 111, Wharton practice professor of operations and information management Cade Massey and Wharton doctoral student Berkeley Dietvorst explain what their research — “Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err,” which was co-authored with Wharton operations and information management professor Joseph Simmons — revealed about the biases hiding in our decision-making, and why we’re so reluctant to trust computer-generated answers if the machine has ever been less than perfect — even though our own record is even worse.

An edited transcript of the conversation appears below. Listen to a clip from the interview using the player above.

Knowledge at Wharton: Let’s start with how this research came about.

Cade Massey: This is Berkeley’s research. But some of the initial thinking came about because of my experience working with firms on analytics. [We were] doing research or consulting projects that involved analytics [and] algorithms, and then taking those results into organizations and running into more or less brick walls on influencing their judgment, even though we knew that we had the goods, we had the data, we had the better insight. But there’s resistance to accepting that kind of judgment.

…. [W]e know that people don’t like [those data-driven answers], but we didn’t really understand why. And we sure didn’t have any insight into what we could do to make people more open to them.

Knowledge at Wharton: How did this all play out for you?

Berkeley Dietvorst: Originally, we were thinking that we could run experiments where people wouldn’t like the algorithms, and then once they got to know them and saw that they were good at making forecasts they’d start to use them more. [But] when we started running these experiments, we actually found completely the opposite, where before people had seen an algorithm perform and make any mistakes, they were willing to use it. But once people had seen an algorithm make a mistake in our experiments, they were very, very unlikely to use it and didn’t like it anymore.

Knowledge at Wharton: It’s interesting because … when you have the data sitting in front of you and for the most part, it’s foolproof, why you would not accept it? But for some reason, it’s not as accepted as probably it should be.

Massey: And that’s really where Berkeley’s research and Berkeley’s insight has led us…. People seem to be harder, in a way, on algorithms than they are on people. No algorithm’s perfect. I mean, even the really, really good ones aren’t perfect. And that little error seems to be a real problem for an algorithm to overcome.

Knowledge at Wharton: So do you have any speculation as to why that is?

Dietvorst: I think one of the main things behind this decision that people make not to use an algorithm after seeing it err, is that once people have seen an algorithm err they assume that it’s going to keep making mistakes in the future, which is probably true to some degree.

Massey: But that’s one of the benefits of these things, is that they’re consistent.

Dietvorst: That’s one of the real reasons we like them. They’re consistent.

Massey: But the bad assumption is that the human won’t keep making errors and the human could even improve, which probably in a lot of contexts isn’t true. The human will keep making worse mistakes than the algorithm ever would.

Knowledge at Wharton: And it’s the ability of the computer or the algorithm to adapt at some point — being given fresh data or being changed in some respects — that people don’t have a handle on?

Dietvorst: Yes, one thing is algorithms can change and improve in the future. But another thing that’s important to realize is, it’s just impossible to be a perfect forecaster, because so many outcomes in the real world are determined by random chance. So, even if you’re completely spot on with your predictions and you’re making the absolute right prediction, you won’t be right every time.

Massey: There’s this great phrase from a University of Chicago researcher … Hilly Einhorn. He has a great paper … “Accepting Error to Make Less Error.” And the idea is you have to accept that your model isn’t perfect, but by doing that overall, you make less errors, because you would be even less perfect.

Knowledge at Wharton: Right.

Massey: But one of our senses is — and we’ve got some evidence on this — that people are kind of in pursuit of perfect prediction. They don’t want to give up on the chance of always being right, even though essentially it’s impossible to always be right.

“The bad assumption is that the human won’t keep making errors and the human could even improve, which probably in a lot of contexts isn’t true.”–Cade Massey

Knowledge at Wharton: But you also bring up in the study that there is a financial component to this as well.

Massey: We have a strong hypothesis here, and so far our evidence is only ancillary. But the strong hypothesis is that, ironically, when the stakes are higher, people are even more averse to using algorithms. So, they might be willing to go with the computer if they’re not too worried about the outcome. But if you put the Super Bowl on the line, for example — it comes down to one play — well, by God, we can’t let an algorithm make the decision then. And we’ve got a little evidence like that but we haven’t looked at it systematically.

Dietvorst: We’re running that study right now in the lab.

Knowledge at Wharton: In terms of the data being completely opposite of what your expectations were, obviously that did throw you for a little bit of a loop. But tell us a little bit more about the understanding of that aspect.

Massey: Well, the first time we saw that we obviously had to replicate it and make sure that the same thing would happen when we ran the experiment again. But then, we started to look at other measures that made that story make sense. So, for example, we did find that participants [lost] confidence in the algorithm once they’d seen it perform, but those who hadn’t seen it perform had higher confidence in it. Once we started to look at all the data, all the signs pointed to the fact that it was seeing algorithms that made people not want to use them.

Dietvorst: Those confidence data are really interesting. don’t lose confidence in themselves when they’re wrong. They lose much less confidence in themselves when they’re wrong, even if they’re more wrong in the model.

Knowledge at Wharton: How do you take this data in this study and apply it in the real world?

Massey: For the current project that we’re working on, we’re seeing how can we get people to use these algorithms even if after they’ve learned that they aren’t perfect. We’ve run experiments where people either had to go with a model’s forecast or their own. Or they could adjust the model’s forecasts and give some of their human judgment, but it was constrained.

So, for example, the algorithm puts out a number and you can adjust it up or down by five. And we found that people like that much, much more than not being able to give any of their input. And actually, when that method errs and people learn about it, they don’t necessarily lose confidence in it. So, as long as they had some part of the decision and they got to use their judgment, that might actually help them use the algorithm.

Knowledge at Wharton: There was an interesting story written in the “Boston Globe” about your research. They brought up an interesting case about restaurants in Norway, I think it was, where instead of having a sommelier decide or tell you the best wines, they had this information plugged in through tablets.

Massey: Why do you think people might be more accepting of wine recommendations via an algorithm than a real sommelier?

Dietvorst: I would guess they’d be less accepting.

Massey: You think less accepting. Why?

Dietvorst: I think they would. I think when the domain that you’re operating in is one where people think that humans have a special insight that machines couldn’t understand, they’ll be especially unlikely to use an algorithm. That’s what I would think. Obviously, we haven’t tested that. A computer could never taste a wine. So, I would think that people just wouldn’t trust it.

By the way, I bet a computer could ‘taste’ wine. … I bet people would hate to know about it, but I bet it could.

Massey: And if it hasn’t yet they probably will —

Dietvorst: They will figure it out.

Knowledge at Wharton: This seems like it has opened the door to a variety of areas to explore, to see how this data affects a lot of different avenues in real life.

Dietvorst: Absolutely. This is why we felt early on it was so important to do the research because the applicability is only growing — with the big data blowing up, and more and more people trying to provide algorithms for decision making in all kinds of domains, we need to better understand what is it that helps people get over the hurdle.

One of the motivations for this work is just a deep understanding of how biased people’s judgment is, and how much that they really need. And how difficult it is to fix that judgment.

“It’s just impossible to be a perfect forecaster, because so many outcomes in the real world are determined by random chance.”–Berkeley Dietvorst

Knowledge at Wharton: And obviously, bias is a big part of a lot of things that we deal with in life.

Dietvorst: Oh, yeah, absolutely. It’s not necessarily small things like, wine decisions or dinner; it can be very large things like retirement savings.

Knowledge at Wharton: From where you are right now, what are those next steps in this research?

Dietvorst: We’re finding that letting people give any of their own input and change an algorithm with their own judgment makes them much more accepting of it and much more likely to use it. We’re running this other study with stakes to see if people are more or less willing to use an algorithm when the stakes are high versus low for the decision.

We’d also like to bring this to the real world and try to do an experiment actually in a company with experts making these decisions.

Massey: To some extent, as we talk to organizations as a normal course of business doing research and consulting, we see some things that work and some things that don’t work. There are applications everywhere. There was a phenomenal piece on Slate [recently] on credit scores and how there’s always been some mystery on what [went into those] scores. And now some organizations are going into your digital footprint to come up with ancillary inputs into these credit scores. And so there are more and more algorithms going into these very important decisions. This scares people.

…So as much as we want to improve people’s judgment, we have to recognize that you can cross a line, and that people have some natural and understandable trepidation about it.

Knowledge at Wharton: And it’s taking us even more and more onto a digital base. As you said, big data’s blowing up but it’s going to continue to grow tenfold as we continue.

Massey: Right. I just think it’s really important for those of us who to some extent advocate algorithms in many cases as improvements on human judgment — we have to get better on what it takes to get people to accept them. And one insight from the consulting world is that you have to be humble about your algorithm. And you have to be open to input from subjective judgment and from experts.

And in fact, I think that humility is one of the most important things you can bring to these conversations. … You recognize, “Look, my model’s imperfect. … And I know you’ve got wisdom that is legitimate wisdom.” And we’ve got to figure out some ways to bring these things together.

It’s one of the things that motivates the experiment that Berkeley’s been talking about where we give them some discretion. So [we say], “Here’s an algorithm and it’ll help you, but we give you a little discretion to modify it.”

Knowledge at Wharton: Do you find yourself tending to fall more on the side of the advice of a person in certain situations over an algorithm?

Massey: [laughs] Yes.

Dietvorst: I definitely have a gut reflex, [but] because of the research I’ve done, I don’t go through with it. I try to trust the data and trust computers. …But no, I feel that same reflex that everyone else does.

Massey: Berkeley’s dissertation advisor and our co-author, Joseph Simmons here at Wharton … is such an advocate of avoiding these biases, and trying to pay attention to the literature and all that we’ve learned over the last 40 years about judgment biases.

Knowledge at Wharton: But that’s probably got to be the biggest hurdle for people to kind of clear: to be able to get past those biases and have that trust in the data. For most people, it’s not normal to do that.

Massey: Right. I’ll give you an example from my life. I collaborate with a former student of mine, Rufus Peabody, on football rankings — the Massey-Peabody rankings. And we talk about them — God, we talk about them forever on the “[Wharton] Moneyball” show. But last year we started — this was originally done for professional football. And from these rankings, you can make predictions about games. But my partner is a professional gambler, and so we always want to compare ourselves against the market, against the line.

And it is a bit of a long story, but it captures this exactly because last year we started doing [Massey-Peabody rankings for] college football for the first time. Texas is having a shaky season.…

Knowledge at Wharton: That’s an understatement.

Massey: [laughs] It was shaky in the beginning, and then it really went off the rails later. Our model didn’t hate them as much as [it] needed to, I thought. I thought the model liked them more than I thought. So, then we come up to our rival game against Oklahoma. And every Texas fan that’s paying any attention at all thinks this is going to be an embarrassment. It’s going to be a Broncos/Seahawks-esque embarrassment. And our model not only liked them, it liked them enough to make them a pick. If we disagree with the market enough, we say, “This is one of our picks for the week.”

And Rufus writes me early in the week and says, “One of our picks is going to be Texas over Oklahoma.” You know, they were favored to lose. They were supposed to lose but we thought they wouldn’t lose as much.

“[People] don’t lose confidence in themselves when they’re wrong.”–Berkeley Dietvorst

Knowledge at Wharton: But they still cover the spread.

Massey: Well, I thought he was joking. I thought he was pulling my chain basically because he knew how much I hated Texas that year. I’m thought, “There’s no way we’re going to do that.” And [he said], “What? We’ve never overridden the model before.” And I said, “Well, the model’s never been this wrong before.” [laughs] “By God, we are not making Texas a pick.” And he said, “Well, listen, OK, we can do that, but we do have a bet — between me and you we’re going to have a bet.”

We had done some web work that cost us some money, and [I said], “If I get it wrong, I’ll pay for that website work.” And he said, “But what if they win? What if they win outright?” You know, they’re like 14-point underdogs. I thought they’d lose by 21, 24. And I said, “Well, it doesn’t matter, Rufus, because it’s not going to happen. So, what do you want to happen if they win outright?” He said, “Instead of being Massey-Peabody, let’s be Peabody-Massey for a week. In The Wall Street Journal, we’ll be Peabody-Massey. On the website, Peabody-Massey. On our Twitter account, Peabody-Massey. If Texas wins against Oklahoma, we will be Peabody-Massey for a week.” [laughs]

And they went out and won. And we published that week, all week long, as Peabody-Massey. … This is what I’m talking about. That’s an example of here’s me preaching this stuff, teaching this stuff, researching this stuff, and I still want to, in that circumstance … because it’s this place where you care a lot, and you feel like you have a lot of expertise, you want to override the model.

Knowledge at Wharton: It just seems like, whatever the data is, there are situations in which people are just unready to give up their belief in their own knowledge. It’s a hard hurdle to overcome.

Dietvorst: I agree. And … even if, in certain domains, people have learned to use a model or an algorithm, [consider when] you’re making an important life decision. Let’s say you’re in the hospital. Do you want to trust the doctor or a computer about whether to get surgery? That [trust] might not carry over. So this is one of those things where even if people can learn it for their job or in a certain domain, it might not carry over to other domains in their life.

Knowledge at Wharton: Is it something that we will see grow in terms of the trust in the data from algorithms and computers over the years?

Massey: I think in a lot of domains, it’ll become automatic. …Amazon recommends products for us. Netflix recommends movies for us to watch. And we don’t even think twice about that. But those are algorithms telling us what to do, making predictions. So I think more and more, that’ll become a natural thing.

Dietvorst: We don’t think twice about getting the recommendation. We do think twice about using the recommendation. We’re not yet completely relying on it.

Knowledge at Wharton: But just using those two examples, I think that it’s easier to just trust that information that is provided to you because of the success those companies have had.

Massey: Right. And there is that track record. And I find another example is driving instructions.

Knowledge at Wharton: GPS, yes.

Massey: And this is a little bit different because … you wouldn’t think it’s probabilistic…. [But] in the world of traffic incidents and changing traffic patterns, it is a little probabilistic.

Knowledge at Wharton: I will fall into that category because on my iPhone — especially if it’s somewhere in my neighborhood, and I’m not 100% sure — I’ll get the directions. And sometimes, I’ll think, “No, they’re sending me five miles in the wrong direction.”

Dietvorst: The funny thing that’ll happen also is that the time it’s right and you were wrong, you won’t remember that.

Massey: Or you won’t know. … You won’t get the counter-factual.

Dietvorst: But the one time it sends you on a route that takes too long, you’ll never forget that.

Knowledge at Wharton: That’s right.

Massey: But it is an interesting question: What happens if you have a very long history? And so, we’re actually very naturalistic in our experiments. We give people 10 trials and they go 10 trials again. We pay them for the trials. We keep them for 20 minutes or so. As experiments go, it’s really not bad. But what happens when you’re working with the new driving app for three or four months and you have many, many, many instances?

So, for example, Waze — do you all use Waze? I love Waze.

Knowledge at Wharton: No.

Massey: This is something that Uri Simonsohn, another colleague of ours, evangelized for a while. And I think it’s the best driving app.… They have 6,000 people in Philadelphia with Waze so they can know what’s going on all the time. My wife resisted Waze for the longest time. But I see her resistance getting smaller and smaller as we have sort of successes. Now, she’ll still complain. But I find myself saying things like, “Just trust Waze. Just trust Waze.” And I’ve learned just to trust Waze.

Knowledge at Wharton: So the comfort level that people have with, let’s just say apps and such — is that how our trust in algorithms is going to grow and grow over the next few decades?

Dietvorst: Right, they’ll make more and more apps that do more and more things in our everyday life and we won’t even realize, “This is an algorithm making forecasts and we’re trusting it.”

Knowledge at Wharton: And people won’t see them as intrusive as maybe we have seen in the past?

Massey: But we want to get them more accepting on a more conscious level as well. Berkeley and I talk about our failures in accepting algorithms, but we’re getting better. Every year that goes by, we get better at it, and we want people to get better at it.

Dietvorst: And the thing is, the really unique important decisions are some of the ones that are the most important to surrender control over.

More From Knowledge at Wharton

What Is the Role of Customers in the Gig Economy?

What Does the Labor Side of Manufacturing Need Over the Next Decade?

From Amazon to Uber: Why Platform Accountability Requires a Holistic Approach

Looking for more insights?