Research from Wharton’s Judd Kessler reveals a significant gender gap in the way men and women talk about their own accomplishments at work, with men displaying more self-confidence than women who perform just as well or better. He explains why employers must be mindful of this gap, especially when evaluating performance.

Transcript

Dan Loney: What led you to study this topic?

Judd Kessler: We were very interested in a question that economists had not really studied that much, which is how do people subjectively describe their performance? It’s one thing to answer a question like: how many units did you sell this year? Or how many new clients did you sign on? But a lot of communication about performance and ability is subjective. There’s no right answer.

You have to describe something that’s kind of hard to pin down, and economists had not really looked at that. But we think that is something that is quite important in the way that people perceive you, so we wanted to look at that style of question being asked.

Loney: Your research showed that women systematically rate themselves lower than men on work performance even when their work may be viewed objectively as being better. Can you account for the difference?

Kessler: This was one of the things that motivated us to subjective questions and the way people answer them. We had this hypothesis that not only is this important and understudied, but it’s something that we might see as a gender gap, where conditional on the same performance, men would describe that performance more favorably.

My co-author on this work is Christine Exley at Harvard Business School. If you asked us about the same paper, I might say, “Oh, it’s phenomenal!” And she might say, “Oh, it’s pretty good.” That difference, the way that we talk about the work, even though it’s the same project that we’re both a part of, was the dynamic that we thought might be interesting to explore.

Loney: Are there subtle ways that this might show up in the office?

Kessler: We were interested not just in the performance reviews that you might get asked once a year where the employer will sit you down and work through a set of questions. We were also interested in the more commonplace and often subtle interactions that you have with your colleagues and your supervisors, where you’re just talking about work and people are getting an impression of how well you do your job and whether you might be able to take on more difficult challenges.

If men systematically talk about their prior performance and their underlying ability in more positive terms, that might change the way that people perceive them relative to their equally capable female colleagues. That was kind of what pushed us a little bit more towards these more subjective questions and the ways in which people typically communicate with adjectives. We still were economists who wanted to study this, so we still asked quantitative questions. We asked things like, how much would you agree with the statement, “I performed well,” on a zero to a 100 scale? We can still get a quantitative measure, but it’s clearly a subjective question that doesn’t have a correct answer.

Loney: What is the potential impact for women?

Kessler: It’s a little tricky because what we don’t want to do is say that women should necessarily self-promote more, that they should necessarily talk more favorably about their own performance. They are in a situation where it’s possible that there will be backlash, and perhaps differential backlash for speaking too positively about their own ability and performance. It’s possible that the gender difference that we observe is a response to social training that people go through, where when they do talk favorably about how well they did, they might be met with harsher responses, so they’ve learned not to push it on those adjectives.

The study that we did took away the possibility for backlash. There wasn’t anyone who was going to respond in a negative way to what the subjects in our study said. In fact, they didn’t even know if there was anyone responding. They didn’t know the gender of the person. But it doesn’t mean that those same forces weren’t influencing the way that men and women talked about how well they did.

The advice is not to the women or the men about how they talk about themselves. Any takeaways from the research have to be on how we elicit information about people’s performance and ability, and potentially not relying so much on the way that people talk about themselves.

Loney: Take us through the research and what you were able to decipher.

Kessler: This was a project I was really excited to be a part of, and it ended up with a lot of parts. There were about 4,000 study subjects who were recruited from an online labor market platform. The first thing they did was they took a math and science test — 20 questions taken from the Armed Forces Vocational Aptitude Battery Test, which is a test used by folks like me as measure of cognitive ability. But the questions we picked were math and science questions. We asked them, how well do you think you did? Out of 20 questions, tell me the number that you think you got correct. That’s something that economists have looked at for a while. We think of it as confidence, how many questions you got correct.

Then we asked our subjective questions — describe performance on the test with an adjective that ranged from very poor to exceptional. We also asked questions on the zero to 100 scale with statements like, “I performed well on the test.” What we found was that the first line result is that men and women performed equally well on average. In fact, if you look across all of our subjects, women performed maybe half a question better. But the men rated themselves much more favorably on all of these scales. On the zero to 100 scale, men gave themselves ratings that were 25% higher than the ratings women gave themselves.

The first hypothesis that we had was maybe this is just confidence re-manifesting. So, we looked at that question of, “How many of the 20 questions did you think you got correct?” Sure enough, even though men and women on average each got about 10 questions correct, men said they got 11 correct, and women said they got eight correct. There was clear evidence that men and women had different perceptions of their underlying ability.

We wondered whether that was what was causing the gender gap in self-promotion. So, after they took the test, we told them exactly how many questions they got correct. Now we’re comparing men and women who each answered 10 questions correctly and have been told that they answered 10 questions correctly. And we asked the same questions about stating agreement with, “I performed well on the test.”

Even with the same performance and knowing they had the same performance, there was still a very large gender gap. The gender gap shrinks a little bit when you give performance information, but it’s still quite big. That told us there’s something beyond not being sure how many questions they got correct. There’s something fundamentally different about the way men and women interpret the same underlying score.

This was one of the hypotheses that we had, and we wanted to push the result a little bit further. We did a few different things. One thing was we told men and women the average self-evaluations of other people who had scored the same as them. Now the men and women both know they answered 10 questions correctly, and now both were told what the average self-evaluation is of folks who get 10 questions correct. Still, we see the gender gap unchanged.

We had another hypothesis, which is maybe women just have higher standards than men. Maybe it’s just the case that if you ask a woman how a certain score is, they always think that it’s less good than a man would. To get at this, we did two things I think are informative. We asked them to describe somebody else’s performance. So, now you’ve taken the test, you don’t know that you’ve answered 10 questions correctly, but we’re going to ask men and women to describe the performance of somebody else who answered 10 questions correctly.

It’s their performance. They don’t know it’s their performance, but now it’s not about themselves, it’s about somebody else. When we do that, we see no gender gap. When describing a third party, men and women use the same adjectives for the same score. It’s only when they’re talking about their own performance that men and women differ.

There is research from economics and other fields suggesting that gender gaps are related to the stereotypes surrounding the tasks. The unfortunate stereotype around math and science tasks is that men are expected to do better than women, even though in our case of course women do better than men. So, we picked a task that is less male type, that’s kind of more neutral, and if anything, folks think that women might outperform men. And that was a verbal task. Instead of doing a test with 20 math and science questions, the subjects are doing a test with verbal questions. And when we do that, we again see no gender gap in the way that men and women talk about their performance when the performance is equally good.

The thing that’s happening here is that there’s something unique about this math and science domain, where the stereotype is that men do better. In that domain, only women are saying that they do less well, even knowing that they have a particular performance and that performance is the same as a man who says they did great.

Loney: It brings up the topic of self-promotion and our perception of how we’re doing in comparison to what we believe our colleagues may be doing. I wonder, can there be too much self-promotion for either men or women?

Kessler: It’s hard to get at this because maybe women don’t say that they’re phenomenal in math and science because they’re worried that doesn’t fit stereotypes of women, and people will react negatively to them. There is a lot of evidence of backlash for actions that don’t fit stereotypes that could be playing out here as well. More research [is needed] about exactly where the line is for what’s the appropriate amount of positivity to have. How much can you overinflate your own sense of self?

In much of what I’ve described, the person who is answering the subjective questions about how well they did, we told them that their answer would be given to another study subject who would use only that self-promotion answer to decide whether to hire the person and how much to pay them. That’s where the self-promotion language comes from, because this is an answer to a question that could be used to determine your pay.

Loney: You also have to look at this from the employer’s perspective and how they understand these differences in hiring and promotion.

Kessler: Yeah, that’s one of the things that I found interesting about the different versions that we did. When the study subjects knew that the employer would see the self-evaluation, they rated themselves more favorably. They did respond to the fact that there was an incentive to say that they did well. But when we took the employer incentive away, we still saw a big gender gap. To me, this says for the employer perspective it’s not about exactly the way that you incentivize the self-promotion questions that’s going to matter. It’s not like you can say, “We’re asking these questions, but we promise not to use them to determine bonuses.” Removing the incentives to promote is not necessarily going to change the gender gap, which was there even absent the employer. It really is something about the way that people describe their ability and performance even when it’s not being used for anything.

Loney: Can you theorize what might be the way for women to close that gender gap?

Kessler: Our takeaway is twofold. One, we think it’s not the responsibility of the women. It shouldn’t be the responsibility of the women to change the way that they act for multiple reasons. One, it’s unfair to put it on them. Two, it’s not clear that we know how to help people change the way they think about their own ability and performance.

One result that I didn’t tell you that speaks to that is the study on youth. We did a study with 10,000 middle and high school students, where we had them do essentially the same thing except without employers, and we did it with shorter math and science tests. We saw gender gaps in the way that people talked about their ability and performance in every grade that we looked at, from as young as sixth grade all the way up to seniors.

That was pretty clear evidence to us that this was not something that was coming at a particular point in time. If there is intervention to be done, it might need to happen before sixth grade. That’s the second thing: It’s not clear how we would change the way perceive their own ability and performance.

Third is the thing I’ve mentioned already, which is the potential for backlash. It’s not clear that we want to encourage women to promote more because it could be optimizing given the incentives that they face. What that means is it pushes the fixing the problem back to the employers, who at interview stages, at application stages, at the performance review stages, should rely less on these subjective self-evaluations and more on objective measures to the extent that they exist in assessing ability and performance.

Loney: Where do you want to take this research next?

Kessler: We are starting to think more about how the social dynamics of the work place or educational institutions interact with the gender gap. One thing that we started to look at in our research was whether people perceive the gender gaps, whether [they] anticipate that women are going to talk less favorably about their own ability and performance. And it does not look like the study subjects appear to anticipate that the gap is there.

That doesn’t answer a key question, which is do employers, who are doing the hiring and promotion and deciding on bonuses, learn over time that women talk differently about their performance? It’s possible that good employers have recognized that if they ask this question, they get the gender gap, and maybe they even correct for it to a certain extent. I think that is the next line of work, to figure out is that observation there, and are the corrections actually being made when they need to be?