Uncovering Bias: A New Way to Study Hiring Can Help

Research has shown how easy it is for an employer’s conscious and unconscious biases to creep in when reviewing resumes, creating an uneven playing field that disproportionally hurts women and minority job candidates.

It’s an issue that’s top of mind for researchers as well as job candidates: Labor economists have long struggled to find effective ways to study the hiring process. One widely used method involves sending out fake resumes and drawing conclusions based on how employers react to them. Known as an audit study, this method can be problematic because it limits the type of industries and positions that can be studied, and also has the potential to draw the ire of HR specialists who may encounter the fake CVs while trying to recruit real candidates.

New research by Wharton business economics and public policy professors Judd Kessler and Corinne Low and doctoral student Colin D. Sullivan aims to pioneer an innovative way for researchers to study hiring without subterfuge. While testing this new method – incentivized resume rating — with companies recruiting Penn students, they uncovered evidence of how bias seeps into the hiring process of some of the world’s top firms, many of which have a stated commitment to diversity. One example: They found that for jobs in STEM fields, women and minority candidates with 4.0 GPAs were treated the same as white male candidates with 3.75 GPAs.

Incentivized resume rating was inspired by a previous study of Low’s about online dating. Using this method, hiring managers knowingly review and rate fake resumes and are then matched with real-life candidates based on their expressed preferences.

“We didn’t have to trick employers into doing this,” Low said, contrasting the team’s study with the audit methods. “That might not be worrying when it’s just one researcher sending out a couple thousand resumes. Some employers spend 30 seconds reviewing each resume, so it’s no big deal. But in the last 10 years or so, 90 of these studies have been published. If each takes a couple of thousand resumes, that’s a large cost you’re imposing on employers — you’re wasting their time.”

The paper, “Incentivized Resume Rating: Eliciting Employer Preferences without Deception,” is forthcoming in the journal American Economic Review.

Matching Real Candidates with ‘Franken-resumes’

The incentivized resume rating study was done in partnership with Penn’s career center, using students who were searching for their first full-time jobs post-graduation. The companies participating included top names in fields such as finance and technology, including firms that might have been difficult to reach using the audit study approach, Low said.

“If each takes a couple of thousand resumes, that’s a large cost you’re imposing on employers—you’re wasting their time.”

“Audit studies rely on sending out cold resumes, which means we can only look at the employment preferences of firms who will respond to those,” she said. “A lot of prestige firms only hire through on-campus recruiting or if you have some other type of relationship with them; if not, you’re not on their radar at all.”

She added that these firms are also hiring for highly technical positions that don’t usually show up on free job listings services like LinkedIn or Indeed. Audit studies are typically limited to openings that appear on such sites because they are designed to respond to open calls for resumes. “It’s a superstar economy and you get such a return from going to the right schools, working at the right firms,” Low said. “But we actually know very little about the preferences of those firms and what helps you get hired at those firms.”

Because employers were in on the process from the beginning, the researchers were able to study both their interest in particular candidates and also their views on what Low called a candidate’s “get-ability,” or how likely the firms thought they would be able to successfully recruit that person.

“If the employer knows that they’re not really offering a top job or internship and they get a really top-notch resume, they might think there’s no point in calling that person back because they will get snapped up by a top firm and the employer knows that it can’t compete,” Low said. This can cause problems for interpretation — for example, in the audit literature employers tend to call back recently unemployed candidates more than candidates who currently have jobs. “The audit literature usually treats getting called back as good — it means the employer wants you,” explains Low. “But, I doubt the employers really want unemployed candidates more. Rather, it’s about who they think is available and isn’t going to be a dead-end. In our study, we can directly measure the difference between who employers like the most and who they think is likely to come.”

The Wharton team’s study asked all of the employers to rate resumes based on quality and get-ability. Having ratings, rather than an up/down callback decision has another advantage, allowing the researchers to see “a lot more data on what they like and what they value and at which points in the quality distribution they value it,” Low said. In an audit study, “people who don’t get called back are in a no man’s land where we don’t know how an employer felt about them,” she added. “If you’re trying to design an algorithm or mapping of employer preferences, there is all this missing data. We understand what pushes you over the bar, but we don’t understand anything about employer preferences lower down on the quality distribution.”

“It’s a superstar economy and you get such a return from going to the right schools, working at the right firms. But we actually know very little about the preferences of those firms and what helps you get hired at those firms.”

Each participating firm rated 40 randomly assigned fake resumes. The fake resumes were developed using software that scraped real ones to create banks of data for academic major and school, work experience and leadership skills. “We created a tool so something that looks exactly like a resume is created live as an employer clicked through,” Low said. “It was pulling randomly from that underlying database of characteristics.” Because so many different permutations were available, it’s unlikely that any two employers saw the exact same pool of what Low called “Franken-resumes.”

Low said the researchers paid particular attention to how they randomized the work experience on resumes, including the prestige of undergraduate internships and whether the fake candidates listed part-time or summer jobs outside their educational goals, such as waitressing or retail positions. The fake resumes were also designed so employers saw names that clearly conveyed candidates’ gender and ethnicity.

Biases and ‘Get-ability’

The ratings from the fake resumes were used to match the companies with real candidates from a bank of hundreds of actual CVs from Penn students. Among the paper’s key findings:

— Employers recruiting in humanities and social sciences did not rate female or minority applicants lower on average, but employers recruiting in STEM rated them statistically significantly lower, leading to the previously mentioned effect that a white man with a 3.75 was rated equally to a female or minority with a 4.0. The authors attribute the effect to unconscious, or implicit, bias, since firms report highly valuing diversity.

“Firms need to remember that if you have some of these biases, they’re going to get hard-wired into the algorithm. You have to think very carefully about how to strip that out.”

— Across the board, employers gave less credit to female and minority candidates for having a prestigious internship. “It was quite a big effect,” Low said. “Women and minorities only got about half the boost that a white man would have.”

— Employers in general rated female and minority candidates lower in “get-ability,” meaning they believed those candidates were less likely to accept a job offer. “Companies say that they value diversity and think that everybody else is doing the same, and that they’re all going to be squabbling over these candidates,” Low said. “But our research shows they don’t actually have those preferences.”

— Employers placed significant value on the quality of the internships candidates held prior to their senior year in college. Low noted that firms indicated that they would choose a candidate with a 3.6 GPA and a prestigious internship (think a position at a top consulting firm, investment bank or brand) over a candidate with a 4.0 who didn’t have that type of experience.

Low said the study also shows that employers placed no value having a “work for money” job such as waitressing or cashiering during the senior-year summer. “Students got no credit for being a lifeguard or working as a barista or being a cashier even though those jobs could actually build some really useful experience,” she noted. “That tells us that it might be particularly challenging for students who come from lower socioeconomic backgrounds and need to work to earn money in the summers to get these top jobs.”

For top employers who want to diversify their workforces and identify nontraditional candidates, the results of the study offer a stark reminder: It’s easy to say you want to think outside the box in hiring, but companies are still struggling to do it in practice, with both conscious and unconscious biases getting in the way.

“They are excluding the exact type of candidates they say they want to be interested in,” Low said.

What’s Next

Using AI or machine learning to cull through resumes is a growing trend, but Low said to be cautious about the idea that it would solve hiring biases. “Firms need to remember that if you have some of these biases, they’re going to get hard-wired into the algorithm. You have to think very carefully about how to strip that out.”

In terms of other research, Low, Kessler and Sullivan are making the incentivized resume rating software they built and the algorithm they used to match employer preferences to real job candidates available to other researchers who want to use it to conduct their own studies. “We’ve already had inquiries from other researchers who are interested in this technique,” said Low. “There are so many interesting and important questions in labor economics that have been hard to answer with resume audit studies for one reason or another, and our methodology gives researchers another tool to use.”

“It’s important to know what opportunities are being made available to top-tier college grads and whether those opportunities are equal.”

Low thinks the results from Penn are especially important because they shed light onto an understudied domain in hiring. “There’s this discussion in economics that sometimes experiments are too focused in where they can look, rather than where the most interesting questions are — it’s been compared to looking for your keys under a light post because that’s where the light is, rather than where you lost them. With resume audit studies, we could only study one type of firm — the ones who hired by going through cold resumes. So that was a lot of administrative jobs, it was often smaller firms.”

The firms hiring at Penn, she explained, play a key role in the overall economy, and their practices were “left in the dark” by resume audit studies. “It’s important to know what opportunities are being made available to top-tier college grads and whether those opportunities are equal.” In other words, Low hopes their method will help labor economists shine light in new places.

More From Knowledge at Wharton

The Hiring Trade-off Behind Startup Growth

Why Women Are Leaving Male-dominated STEM

How AI Is Reshaping Customer Experience and Expectations

Looking for more insights?