After years of conducting research on gender bias in the workplace, Wharton professor Katy Milkman has reached a singular conclusion: Systemic change is necessary to create long-lasting progress towards diversity, equity, and inclusion. She shares some insights from her most significant studies on diversity training and hiring bias.


Dan Loney: Katy, you have done a lot of research around gender in the workplace, and there are still a lot of unanswered questions out there. What are some of those that have inspired your research in this area?

Katy Milkman: I think the most important questions are: How do we solve it? How do we solve the issue that women are still underrepresented at the top of organizations? We are trying so many things. We need to eliminate bias and change the individuals in the organizations to be more open-minded about working with women. We need to change the structure of organizations, so we can make them more accommodating to the different needs and preferences and strengths and challenges that women face.

Is it that we need to eliminate the biases that women face not by changing the shape of the organization, but the way decisions are made inside the organization? There are all these different possibilities in terms of what can fix what I think most people at this point agree is a real problem, because an enormous amount of talent is not being tapped to generate the best outcomes for organizations. And the truth is we still don’t know the right combination of solutions. But my research and the research of many whom I collaborate with is trying to pick away at some of the big questions there and get some answers.

Loney: Going back a few years, you did a study on the effectiveness of diversity training. What did you learn, particularly about how these programs affect women?

Milkman: This is a really exciting project that I got to work on with a massive team at Wharton several years ago. It came about because at the time, I was co-directing the Wharton People Analytics Initiative, along with Adam Grant, Angela Duckworth, Cade Massey, and others. The four of us felt that one of the most pressing questions in people analytics that needed to be answered was whether or not these diversity trainings that so many organizations were pouring money into were adding value. And could we build one that was really effective? That was our real goal. Let’s prove whether or not this works, and let’s build the best version of diversity training we can, given what we know about the science of discrimination and bias in organizations.

We found an incredibly brave organizational partner. It was a Fortune 500 company that was ready to team up with us and do this project. And the reason I call them brave is because it’s a really challenging thing to do as an organization, to open yourself up to testing, to see whether or not there’s potential gender bias and if you can combat it in your organization. It opens you up to lawsuits, and a lot of organizations are not brave enough to team up with academics and do that kind of science.

We also had a first year Ph.D. student passionate about this topic, Edward Chang at the Wharton School. Now he’s a professor at Harvard Business School. We’re very proud of having mentored him and the amazing work he did leading this.

We built a roughly one-hour online diversity training program that was primarily focused on introducing people to the idea of gender bias, teaching them about the science showing that women are often facing backlash when they do things like negotiate for higher salaries. There are implicit biases that influence the way we judge other people, even if we don’t explicitly intend to discriminate. We have different associations with women. We expect them to spend time in the home, and it’s easier for us to sort words with women that we associate with doing work in the home than it is for us to sort those words with men. It’s easier for us to sort words related to career with men than it is for us to sort those words with women. There are speed tests called implicit association tests that show that even when you say, “I 100% support women at work. I do not hold any belief that women need to be at home, as opposed to in the office.” In spite of that, you’ll still show these implicit associations because it’s like smog. It has been around you your whole life, and you absorb these stereotypes that are in the world.

We had participants take one of these tests, and then we armed them with tools they could use to combat bias at work. For instance, one strategy that’s been proven effective for reducing gender biases is, say you’re evaluating CVs and worried you might be giving priority to men rather than women who are otherwise equally qualified, just because of your implicit biases. Well, one thing you can do is evaluate CVs without names. It’s called blinding. That’s a way that you can eliminate bias from that process. We gave them a series of tools of that sort and walked them through some scenarios, so they could think about how to use them. We’re really proud of what we built.

Loney: When you’re talking about diversity training programs, are there downsides and limitations?

Milkman: That’s really what we set out to test by developing this partnership with a Fortune 500 company, doing a randomized controlled trial to evaluate whether or not randomly assigning some people to complete a diversity training program and others to complete an unrelated program for an hour would produce benefits for women at the organization. We did this with thousands of employees, and we measured a number of different outcomes.

We looked first at attitudes. At the end of both the diversity training and a placebo training program that focused entirely on unrelated material, we asked people questions about their attitudes towards women in the organization and in general. We had some scenario questions related to supporting women where we asked, “In this scenario, how might you try to problem-solve?” That’s a simple attitudinal or scenario-based way of evaluating whether the training had an impact.

But we also really cared about downstream consequences. Would women be treated differently in the organization? Would people make more of an effort when they had an opportunity to nominate women for awards? Would they be more willing to mentor women? We had a number of different measures where we looked at exactly those kinds of outcomes and even what we call a little audit experiment, where an email went out and invited people to offer support and information to a new female employee or male employee. Was there a difference in the willingness to support new female versus male employees, as a function of whether you’d been through the training or not?

What did we find? Does it backfire? Does it work? Does diversity training produce the outcomes we had hoped for? I wish I had a really simple answer for you. If I had to boil it down to one thing, I would say that diversity training wildly underperforms expectations, but it’s subtler than that. On average, we see that it has bigger effects on attitudes than on actions. There’s very little evidence of movement in terms of behavior. In terms of the attitudinal lift, we do see that it’s having an impact particularly on those who belong to a subpopulation that suggests they had more room for growth, because the average attitudes in that subpopulation started out lower at baseline. This means it’s helping maybe to change the attitudes a bit more for men, in international settings in particular.

People whose behavior shifted most tended to be the people whose attitudes were most aligned with the training to begin with. In fact, women are the group that ended up changing their behaviors the most, but one of the interesting things is that some of the behavior change we saw was not the kind we were expecting. We were expecting to lead more people to sign up to mentor women if they’d gone through the training. We do see that, but we actually also saw women looking for mentorship themselves at a higher rate when they’d gone through the training, which was really interesting.

P.S., this also happens with minorities who go through the training. So, one of the key results is that through the training, we may have made it more salient to women and minorities that they needed sponsorship, that there were threats to their success in the organization, and they needed to look out for themselves and find other women and minorities to support them. That’s not normally what we think of as the goal of diversity training. The goal of diversity training is how do we ensure that our workforce, particularly those in positions of power, are providing more support to women and minorities? Instead, what we’re doing is alerting women and minorities to look out for themselves. It’s not necessarily a bad result. That may be beneficial, but it’s not the intention of the program. It’s not the reason that billions of dollars are being poured into these kinds of programs at companies around the world.

Loney: Many will think about fixing the people, but there would also have to be a focus on fixing the system, correct?

Milkman: Yes, I think that’s one of the most important takeaways from all the work that I have done and all of the work I have read in the last several decades of a blossoming amount of research in this area of how we increase diversity in organizations. When we try to fix people and say, “We’re just going to make the managers better at treating women and minorities with respect.” Or, “We’re going to make the organization friendlier by changing the way that we talk about diversity.” These solutions generally have not lived up to expectations.

A key limitation of the research we did is that we’re looking at a one-hour online training. You might see something really different if you did a week of training with a trained facilitator and really hammered these points differently. But still, every study I have seen points in the same direction of, if we just try to fix attitudes, change beliefs, try to change people — it’s just much harder to do that. And this is true not only when it comes to gender diversity issues in organizations, but any kind of human biases. I tend to study decision-making more broadly. I look at gender diversity, but I also look at other biases and judgment, besides biases against certain groups of people, that can lead us to make mistakes. They’re incredibly hard to train away.

What we find in both situations is that what works much better than training is changing systems. Systems support better decisions. And that’s really what we’re finding time and again. Don’t fix the person; fix the system they’re embedded in, so the system is better structured to support the outcomes we want to see.

Loney: Can you talk about this concept of isolated choice effect and how that works?

Milkman:  Yes, this is an example of changing systems, as opposed to changing people. Edward Chang also is one of the co-lead authors on this work on isolated choice effect, along with another former Wharton Ph.D. student, Erika Kirgios, who’s now a professor at Chicago Booth. Edward and Erika led this project, along with Aneesh Rai and myself. What we were trying to do was figure out whether or not we could restructure the way that selection decisions are made to improve outcomes for women. You can think of a selection decision as, “Who am I going to hire for this job?” Or, “Who am I going to put on this prestigious committee or panel, or put up in front of my organization and highlight as a star?” Who is going to get selected for opportunities that are important?

Normally when we think about promotions and hires, we can make those decisions one at a time, right? But another way you can make choices in hiring decisions is in sets. You can say, “We’re going to have a cluster hire. We’re going to hire five new faculty in this department in the spring.” “We’re going to put five people in this award category.” We have options, because any organization that is growing has opportunities, and they can be clustered or not. We hypothesized that when people hire or promote or select for opportunities one at a time, they focus on just the attributes of the person in front of them and don’t think globally about how that person is contributing to the diversity of the organization. They’re just looking at the one person. That’s their focus.

When they hire in sets, our hypothesis was they’re going to attend to what the group looks like. If I hire five people in a row at the Wharton School one at a time, I probably don’t even notice how those five people look as a group because I’m zoomed in looking at candidates. But if I hire five at a time, I’m thinking, “Wow, how did I end up hiring five people who were all from the exact same university and have the exact same dissertation advisor? They all look the same. They all walk the same and talk the same.” I might notice if there’s a lack of diversity because hiring in a set forces me to attend to that.

And that is what we found in study after study. When people are choosing from exactly the same applicant pool for each hire, they make more diverse selections in sets than in singletons. The set forces a focus on the aggregate, on whether you’re creating a pool that has diversity, that supports your values, that reflects the diversity you want to have in an effective organization. But when you hire in isolation, you don’t have that. That’s a structural change. I’m not changing the people who are making hiring decisions. I’m not training them. I’m not suggesting to them to focus on diversity. But what naturally happens when we look at sets, is we think about diversity. And when we look at singletons, we don’t.

Loney: You’ve also looked at how social norms affect group composition and tend to contribute to underrepresentation of women. What did you find?

Milkman: You will laugh because yet again, guess who was leading this work? My amazing former student, Edward Chang. I do a lot of work with doctoral students on this topic, and Edward was an incredibly productive student who really is passionate about understanding gender diversity in organizations. We had a great series of collaborations. This project idea came from my husband, so I just have to give him a shout-out. He’s a physics professor here at Penn, and he said to me, “Katy, I’ve noticed that it seems like when a physics department decides they lack gender diversity, they panic, they put a whole lot of effort into a hire, finding — stealing — a woman from another top institution. They get one, and then they breathe a sigh of relief, say, ‘Our problem is solved,’ and they never think about it or talk about it again.” And he said, “Does that happen? Are people just trying to grab tokens, and then they put a check box, and they quit?”

And I said, “That’s a really interesting question. Let me see if we can come up with a way to test your hypothesis, but in a setting that we think might be more consequential than academic physics departments.” We decided to take it to the boardroom. This is joint work with Modupe Akinola at Columbia and Dolly Chugh at NYU. We grabbed the universe of data on who sits on corporate boards at Fortune 500 companies, and we looked at the distribution of the number of women on boards. We realized that if there were cliffs in that distribution, meaning if there was a point at which you could clearly see a giant discontinuity in the representation of women, that might suggest something about organizations ‘satisficing,’ trying to reach a specific point and then quitting on their efforts to achieve diversity in the boardroom.

We actually did something that I’m very proud of. We created this little simulation. It’s like taking all the board directors in all the Fortune 1500 companies and playing musical chairs with them. We’re going re-sort and reshuffle, and we’ll do it a thousand times, and we’ll figure out what would the distribution of women look like if this is how boards were created with no attention to the diversity. Then we’re going to compare that random shuffle to what we actually see and see if it looks like there are any contortions, any cliffs, suggesting a magic number above which boards stop making an effort to achieve women.

When we compare the two distributions, we see a huge gap. The huge gap is actually not at the magic number one. It’s at the magic number two. You can see this giant discontinuity. Boards are racing to get exactly two women, and then they stop trying. There’s a giant drop-off, relative to what you’d expect if they were just seating women at the same rate as any other member of the population that’s out there and available for board seats.

Interestingly, when we do a historical analysis, we can go back in time because who is seated on boards has been tracked for ages. We can see that there was a transition point from tokenism. It used to be that one was the magic number when boards quit, to twokenism about a little over a decade ago. What’s going on? It turns out the social norm at this point is most boards have two women. If you have less, you are deviating from all the others, and you might be called out in the media. You might be labeled the kind of company that’s not supporting women and putting them on your boards. Companies in the Fortune 500, which are under the microscope more than the 1500, show a greater degree of this bias. We ran experiments where we showed when you’re choosing who to add to a group outside of the boardroom, just any sort of selection decision, social norm information is very salient and leads to these kinds of clustering effects. Nobody wants to be the outlier because they worry that that will yield negative repercussions.

I think this is another really interesting finding, and what’s important about it to me is it shows the power of scrutiny. One of the reasons that organizations, especially at the highest levels, are attending to diversity and making an effort is they don’t want to be called out. And that gives us power, because if we as a community, we as a society want to see greater diversity, then we need to continue to point out when organizations aren’t achieving it. And that is highly motivating and leads them to better behavior.

Just as we saw a tipping point where eventually tokenism gave way to twokenism, hopefully we’ll get to the point where it’s humiliating to have a board with less than three women. And we need to put the same pressure on minority representation. I think we’re doing better with gender representation and creating that scrutiny right now than we have with creating scrutiny around minority representation.

Loney: What should organizations do in terms of confronting gender bias?

Milkman: I have a few recommendations. One, I don’t think you should use diversity training as your solution. That is not going to fix your problem. Two, make sure you shine a light on how things are going in different groups, because scrutiny increases the social pressure to make different kinds of hiring and promotion decisions and attend to a lack of diversity. That kind of scrutiny and those kinds of social norms matter. Third, when you can hire in sets, rather than in singletons, that is going to lead to greater diversity in your hiring pool because only when we hire in sets do we seem to really attend to these issues of diversity.

And then a final point that’s unrelated to my research, but I think an incredibly important finding that should be used more. There has been some really wonderful research done, led by Joyce He of UCLA and Sonia Kang of the University of Toronto, showing that women, because of probably stereotypes and the backlash they sometimes face when they self-nominate for things like promotions, are very much less likely to put their hand up when it’s time to be promoted, even when they’ve performed at the same level as others. There is research also by Muriel Niederle of Stanford showing women under-compete relative to men when they have the same credentials. Think about the fact that women may be more hesitant to put their name forward, they may be less willing to compete, and create structures where it’s easier because the friction is in the other direction where everyone is going to compete. You have to actually exert some energy to avoid it. That can create a more level playing field.

Loney: What can women do to try to level that playing field?

Milkman: One thing I think is really important is having a strong group of mentors. We know that’s important and that women tend to have weaker networks than men because of homophily, which is a tendency to affiliate with others who are like ourselves. If you’re at the top of an organization and you’re a woman, you’re going to find fewer people who are like you that you can affiliate with and chat with at the water cooler. That’s a challenge. Recognizing that and making extra efforts to look out for opportunities to connect with others, making sure you have strong networks and strong bonds — that’s really important.

I have a group in my own life that is a No Club, and this is based on research that was done by Linda Babcock and collaborators at Carnegie Mellon University, showing that women are too willing to say yes to non-promotable tasks at work. We’re too quick to do the office housework — taking notes at a meeting, organizing the holiday party — things that aren’t ultimately rewarded but can be very time-consuming. Linda and her collaborators have written a wonderful book,  called The No Club. Their solution was to create a group of women who were at a similar career stage, who help support each other in saying no when it was necessary. Because interestingly, even though we’re bad at saying no for ourselves, women are just as good as men at saying no for others, recognizing when others should make a certain decision. We’re very good at arguing for other people. We’re good at arguing for ourselves, too. It’s just there’s a lot of pressure against it, and often we conform or fold in the face of that pressure.

Loney: Where is gender equality a decade from now?

Milkman: Honestly, I’m such an optimist, especially after doing the work we discussed on twokenism on corporate boards and seeing how things have evolved. I wish they’d evolved a lot faster, but we’re seeing progress in the right direction, and I think the pressure and scrutiny have accelerated. I’m feeling very hopeful that things are going to continue to get better. And there’s certainly a lot of scientific attention on this question.

When I was a graduate student starting to do research related to race and gender bias and how we can solve it, it was not a very popular area. I think as it has become easier and easier to collect massive data sets, as A/B testing has become more straightforward to do in organizations and more accepted, maybe in part because of tech being such a big part of the corporate world. It’s becoming more a part of the culture. We’re accelerating insights about what works. As those insights become more widely adopted, along with the increased attention to these issues and desire for equality, I’m optimistic that we can use science to get where we want to be.