Wharton's Edward Chang and Katherine Milkman discuss their new research on the effectiveness of diversity training.

In 2018, two black men were arrested in a Philadelphia Starbucks after asking to use the bathroom without placing an order. The incident of racial bias made headlines nationwide, prompting Starbucks to close all its stores for an afternoon so employees could take an anti-bias training course. That decision prompted the question: Does diversity training work?  New research from Wharton aims to provide answers. A paper titled “The Mixed Effects of Online Diversity Training,” which was recently published in the scientific journal Proceedings of the National Academy of Sciences, is the result of a unique partnership with a global company that allowed researchers access to over 3,000 employees for the purposes of the study. Edward Chang, a doctoral candidate in the decision processes group at Wharton, was the lead researcher. Chang and co-author Katherine Milkman, a Wharton professor of operations, information and decisions, recently joined the Knowledge at Wharton radio show on SiriusXM to discuss their research. (Listen to the podcast at the top of this page.)

An edited transcript of the conversation follows.

Knowledge at Wharton: Early in the paper, you state that there hasn’t been a lot of research on the effectiveness of these types of diversity programs. Why not?

Katherine Milkman: Well, it is actually really hard work to do. There is a large body of research that looks at the effectiveness of diversity training — it just has never been done the way we were able to do it, which is a large-scale study in a real organization, as opposed to an educational setting. And we measured actual behaviors, rather than just how people say they feel or what they say their attitudes are right after completing it. You can see why that might be hard to do, because you have to convince an organization to open itself up for this kind of a project, to let you measure the behaviors of their employees downstream. It was quite a negotiation to get there, and we feel really lucky that we found a great partner willing to let us do something of this scale and magnitude on this important question.

Knowledge at Wharton: Can you take us through the research?

Edward Chang: One of the big innovations in our study is that we did a field experiment. We randomized people into taking either diversity training or placebo training (which was training about a topic that was unrelated to bias or stereotyping or diversity). This is really important because it helps us disentangle whether effects of diversity training are due to people being willing to volunteer for diversity training, or just taking any training at all. We can really see what is the effect of doing diversity training, specifically.

We had about 3,000 participants. They were randomly assigned to take one of our online trainings, one of which was a diversity training. Through this diversity training, we used the best behavioral science we know to try to get people to reduce their biases and stereotyping, and teach them strategies about how to be more inclusive in the workplace.

Knowledge at Wharton: What were the results?

Chang: We measured the results of our training program in a couple of different ways. First, at the end of the training we had survey questions. This is standard in a lot of other research. We measured their attitudes at the end of training to see whether they showed any evidence of learning the content. But what we really cared about was measuring the behaviors afterward. Particularly, we unobtrusively measured behaviors in a variety of different ways in the months following the training to see if what they learned actually stuck and changed their future behaviors.

Knowledge at Wharton: How important is that part of it, considering the challenges of negotiating with a company to do this type of research in the first place?

Milkman: I think this is the whole ballgame because organizations are doing these diversity trainings hoping to change outcomes in their organizations and to promote diversity and inclusiveness. Not just a change in what people say they will do, but how they actually act. Will it change real behaviors? That was the missing ingredient in so much past research. Being able to measure whether we changed the way people treat one another and the way that they mentor in these organizations — it was a huge game-changer.

Knowledge at Wharton: How much of a concern is it that, when you’re offering an employee one hour of diversity training, they are doing because they feel like they are obligated to do it?

Chang: That is a definite concern about diversity training. There has been a lot of research suggesting that one reason why diversity training might fail is because the organizations and the people in them kind of just feel like training is checking a box — that they are doing it to say that they have done it.

Some people were criticizing Starbucks because it sounded like they were just doing diversity training as a perfunctory thing to be like, “OK, we know something bad happened, now we are going to slap on this Band-Aid of diversity training.” But one of the reasons we were excited to run this research is because we are not sure what effects diversity training really has on people and their behaviors.

Knowledge at Wharton: In looking at the different pieces involved in the training here, part of this was about gender, correct?

Chang: Yes, [our partner] was primarily interested in looking at how diversity training would affect people’s attitudes and behaviors towards women. Partly, it is because it is a global company, so gender issues are relevant across countries. When you look at something like race, it’s very, very salient and important to think about in the U.S., but it is maybe not as salient in other countries. We were working with a global company and had participants from 63 different countries, and so a lot of the focus in terms of measurement and content was on gender.

Milkman: The gender issue is much more generalizable to the global population, and that is one of the reasons that we focus so much of our attention on that. I think that our results are interesting in both groups, so we can say something pretty important about both race and gender training. I am glad that we were able to do both, but I do think the thrust was on gender because of this international focus.

“If you make structural changes, you are not asking people for time; you are changing the way the organization operates.”–Katherine Milkman

Knowledge at Wharton: You did this research within one organization, but can the results be translated to a larger number of organizations?

Milkman: A limitation of this is that it’s just one setting. Every employer has its own unique culture, and that could interact in interesting ways with what we have done. An important thing to note is that this is an employer that cared so much about this issue that they were willing to do this big, rigorous test with a team of researchers and put the time and energy in needed. Obviously, they are already way on one end of the spectrum in terms of their handling of this issue, and that is certainly a concern. If we were to repeat this in an organization that was struggling even more with these issues — that wasn’t already pretty progressive in terms of their treatment in terms of this topic — we might see different things.

Knowledge at Wharton: What is unique about this research?

Chang: I do think one of the strengths of this research is that we were able to collect information from people from many different countries. A lot of the research on these topics is focused on just, for example, U.S. employees. We are able to get people from all of these different countries.

We are also able to do what we call heterogeneity analyses by exploiting variation and the fact that people in different countries have different attitudes. We could use that to get better understandings of maybe what the psychological mechanisms going on as a result of training were.

Knowledge at Wharton: What were the conversations like with some of these employees after they took the test?

Chang: We don’t necessarily talk to the people immediately afterward. We do survey questions such as multiple choice questions. In general, we found that people were more supportive of women after taking a training. This effect was particularly true among people in countries outside of the U.S., where we think maybe their attitudes towards some of these topics were maybe slightly less progressive than people in the U.S. Just because in the U.S., I feel like we have talked a lot already about issues of bias and stereotyping, and particularly things like implicit bias and unconscious bias.

Milkman: In fact, we have evidence that this is true. When we measure baseline attitudes among untrained employees who took a placebo training, they were less open to expressing that they wanted to be inclusive of women. We measure that, and then we can see how that relates to the responsiveness.

Knowledge at Wharton: What was the placebo training?

Milkman: It was basically a training on a different topic. It was about psychological safety, which is a construct that is important in the management literature. It’s ways to make your teams feel more comfortable coming forward if somebody discovers that there has been a problem. So, it’s very unrelated to diversity training.

The importance of it is that we can have two groups who both raised their hands, volunteered, and said, “Hey I want to do a training. I am willing to do a training.” Both of them experience 60 minutes of training on something, but one of them experiences 60 minutes of training on diversity and the other on an unrelated topic.

Everything about them is identical, except that one learned about diversity, so we can compare apples to apples. We can isolate the effect of our training and say it did this to people, it changed their attitudes and behaviors in this way.

Knowledge at Wharton: But how much information can you gain from a one-hour training? Is that the right amount of time to change attitudes?

Chang: The reason why we chose one hour for our intervention is because that is pretty common for diversity training. When we looked at how other companies implement diversity training, oftentimes it is these one-off, very short, one hour, maybe two hours kind of thing. So, we think it is pretty valid or ecologically valid — it kind of reflects the reality that we see.

But the results of our training were not large, particularly when we look at behaviors. Although we did see attitude change in most groups, for behavior change we didn’t see that much movement, particularly among the groups of people who historically have held the most power in these organizations, such as men and white people. My interpretation of our findings is that one hour probably isn’t sufficient. We can’t expect a one-off diversity training to solve all of your problems relating to bias and stereotyping in the workplace. It could be one part of a multi-pronged solution where you combine diversity training with other changes you are making to processes and structures to reduce instances of bias and stereotyping. Or it could be potentially a multi-day thing where you do training multiple times over a longer period of time, and maybe that would be more effective at changing people’s behavior.

“If you ask people who study cultures in organizations, they probably would not say that the way you change culture is to do an hour-long training about what you want the culture to be.”–Edward Chang

Milkman: But we would need more data, so that hasn’t been tested either. Maybe that would also be disappointing. But the key finding here is, as Edward said, this isn’t enough. It is very clear that this isn’t enough. It may not be a dosage issue. It may be that this just isn’t the way to solve these problems in organizations and we have to make really serious structural change.

So, we don’t want to say, “Just up the length of your diversity training.” That is not our takeaway. Our takeaway is, an hour doesn’t do it, so we have to find other solutions. Maybe it is a higher dosage, maybe it is structural change. Honestly, my instinct based on the other work we have done and the other literature on this topic would be that we need to be moving towards structural change.

Knowledge at Wharton: It becomes a very important topic because there is increasing focus on the culture within a company, how employees react with one another, and what components you want to have within the office complex.

Chang: That is definitely true. How do you change cultures is a huge question. I think if you ask people who study cultures in organizations, they probably would not say that the way you change culture is to do an hour-long training about what you want the culture to be.

Milkman: One of the things that is great about this training is, in some sense, it depends on if you are a glass is half full or half empty type person. We invited roughly 10,000 employees at this company to take it, and about 3,000 did make time for it, which maybe you think is low, only 30%. I thought it was pretty amazing that 30% of the employees would get involved.

It raises questions about if you did want to do something that was at a higher dosage and for longer, how much would your participation drop? Even an hour was a lot to ask. Those are all issues that organizations have to deal with, and I think it is part of the reason a lot have ended up with one hour as their solution. It’s just so hard to convince people to devote time to this.

Again, that points towards structural change. If you make those structural changes, you are not asking people for time; you are changing the way the organization operates — the way hiring happens, the way promotion happens, the way mentoring happens — in ways that will hopefully facilitate a more inclusive workplace.

Knowledge at Wharton: Mentoring programs were included in your research. Can you tell us about that?

“The biggest behavioral effect we found was that for women in the U.S., our training seemed to prompt them to seek out mentorship — to use this as a program to be proactive about seeking out mentorship from senior colleagues.”–Edward Chang

Chang: We worked with our field partner to create a program to be able to see who people choose to informally connect with as an informal mentorship program. That was where we saw some of the most interesting behavioral effects in that we designed this measure to see whether people were being more inclusive towards women and racial minorities by being willing to mentor women and racial minorities in the workplace.

The biggest behavioral effect we found was that for women in the U.S., our training seemed to prompt them to seek out mentorship — to use this as a program to be proactive about seeking out mentorship from senior colleagues. So, it really does seem that the biggest behavioral effect of our diversity training was convincing women that you have to kind of lean in and be more proactive about their careers.

Knowledge at Wharton: In your research, did you find a significant difference between the acceptance of mentoring with a woman in comparison with a man?

Chang: We did not see a difference in a willingness to say yes to men versus women, but we do think that there are potential differences. We know that in general, for social networks, people like people who are like themselves. Men are typically more likely to have social networks that include other men. In particular, for senior people, they are more likely to have mentees who are men.

Knowledge at Wharton: What was the impact of this training on senior executives?

Chang: Our training was targeted at people who are lower in the hierarchy [at this organization], so we didn’t have a ton of senior executives. When you are running these experiments, you want as many people as possible to take these trainings, and there are many more people on the bottoms of organizations than the tops. So, we are not able to say a ton on how our training maybe differs in terms of receptiveness based on how senior you are. We mostly focus on the experiences of more junior people in this organization.

Milkman: We anticipated that the seniority issue was going to be hard just because the numbers are small. One of the things that this experiment was trying to do that was different than past experiments was have really a large group that we could study. With 3,000 people, we knew that was going to mean not a lot of senior folks if we zoomed in. That is why we focused on gender and whether this was international or in the U.S. And we do see really interesting things along those dimensions.

“When we expected actually the effect of the training would be for everyone to try to help women more, instead it was women trying to help themselves.”–Katherine Milkman

Our key takeaway is that the effects of online diversity training really were different for these different populations. With women in the U.S. responding in the way that Edward described, they are more proactive and seem to be responding by saying, “I need to take actions to help support my own career.” When we expected actually the effect of the training would be for everyone to try to help women more, instead it was women trying to help themselves.

Knowledge at Wharton: If online diversity training was step one, is there a natural step two and step three that are already starting to formulate in your mind?

Chang: I don’t think we have a step two or step three yet, but if companies would like to partner and do more research on this topic, we would love to. We would be very receptive to that.

Milkman: I think there are a lot of things that we have to think about. One of the things that was a limitation of this is that it was online, and we know that face-to-face interactions and group interactions, where people can really have meaningful conversations about a topic, can be more powerful ways to change behavior. I think one natural next step would be to think about building the best possible training programs that are live and include dialogue between trainers and trainees, and see how much more how that can move the needle. But again, I am most optimistic about structural change, especially after doing this project.

Knowledge at Wharton: As you said, it is a process just to get a company to agree to this in the first place. I would think taking that next step is going to present similar challenges that you had in this research.

Milkman: Absolutely. This was a mountain that we moved with our amazing organizational partner. It really was a beast to do this. I think that is one of the reasons it hasn’t been done before. It was hard, right? There are a lot of legal concerns, a lot of emotional concerns that are very reasonable. Everybody is worried for lots of good reasons, so it is going to be hard to do it again.

Knowledge at Wharton: Do you think this company understands itself better after this research?

Milkman: I think they feel they have learned a lot. In fact, one thing I think was particularly interesting is that when we trained people and showed them information about gender bias, it had some positive spillover effect to both attitudes and behaviors towards racial minorities, which was absolutely fascinating to us and to this organization.

It was not clear that that would necessarily be the case. We learned a lot about which subpopulations respond to what, what the training is really doing, and the fact that it has these spillover effects to some degree. I think we all feel much smarter about what the benefits and limitations are thanks to this work.