An Imperfect Test: The Problem with Job Performance Appraisals

Wharton management professor Peter Cappelli has spent decades studying the complicated dynamics of employment. In a post-recession world, his research is more timely than ever as companies large and small struggle to adapt to a new normal that relies on fewer employees handling a larger, shifting workload. One practice that has persisted in this changing business landscape is the ubiquitous performance evaluation, which Cappelli describes as universally despised by both supervisors and subordinates.

In their latest research, Cappelli and colleague Martin Conyon, a professor at Bentley University in Massachusetts and a senior fellow at Wharton, question the usefulness and accuracy of performance appraisals and find some surprising answers. Cappelli, who is also director of Wharton’s Center for Human Resources, discussed their findings in a recent segment on the Knowledge at Wharton show on Wharton Business Radio on SiriusXM.

An edited transcript of the conversation appears below.

Knowledge at Wharton: Is the performance appraisal as important now as it was 20 years ago?

Peter Cappelli: It’s more important in the sense that more people have to do it. If you look around the United States, there haven’t been a lot of recent studies. But the last ones that were done show that more than 90% of the work force has a performance appraisal. Basically, if you’re not in a union — where they tend not to get them because of collective bargaining agreements — you’ve got a performance appraisal. The federal government mandates it for employees. State governments do it. The Army does it. The Navy does it. I think the big change has been when you leave the United States. It used to be kind of a U.S. thing, but now you see them all around the world.

Knowledge at Wharton: Are companies seeing a level of importance to performance appraisals that they want to bring to their business strategy?

Cappelli: I think some of it is just the increasing attention to the workforce and recognizing that managing your employees is a smart thing to do, given how important they are to the organization. You’re crazy if you don’t, right? And a lot of companies outside the U.S. just copy what they think are best practices in the U.S., even if they’re not sure why they’re doing it. I think that’s why it’s spreading around the world. You see it all through India and China. Unless you get to the countries like in the Middle East, which still have Soviet-era labor and employment laws — you probably don’t see them very often there. But otherwise, all over.

Knowledge at Wharton: The work with Martin Conyon and this paper is trying to bring together information from a variety of sources about performance appraisals?

Cappelli: The thing about performance appraisals is they are ubiquitous. There’s probably nothing in the field of management that is more common. And there’s also almost no practice in the world of business that people hate more. The evidence on this is pretty overwhelming. It’s also surprising how little we actually know about it. There’s an awful lot that’s been done on psychologists with little slices of the performance-appraisal question. Mainly, what psychologists are interested in is, how did the person [doing the] rating and the person being rated get along? And how do the characteristics of the rater and the ratee affect the results? One of the things that we know from this is one of the best predictors of your score is bias. That is, how you and your appraiser map onto each other. Are you similar? [Then] you get higher scores. The more different you are in terms of ethnicity or age or sex, the less well you’re going to do.

That’s one of the things we know. But how do they actually work inside companies? Quite remarkably, almost nobody has looked at this. We got data from a large Fortune 50 company on all their performance appraisals over a 10-year period. There were a couple of the questions that we were after. One of them is, there’s a kind of view in a lot of places and among a lot of executives that employment is like a contract. At the beginning of the year, you set goals, then we assess how well you’ve done. At the end of the year, we give you a pay increase based on how much you have achieved of your goals and how well you’ve done, maybe compared to everybody else. But there’s another view that it is not like a contract. That it’s really kind of a relationship. If you think about employment, you don’t really have a contract with your boss. The boss is telling you to do different things all the time. And based on what they’re hearing from their bosses, [employees] decide, “Oh, I’ve got to go this way or that way.” Your circumstances are unpredictable, too. It could be, “We’ve got this goal.” But then business collapses, and we change the goal. Or even if you’ve got the same goal, we have to adjust the target. There’s all kinds of stuff that’s in play, so it’s not really a contract. It’s kind of a relationship.

“A lot of companies outside the U.S. just copy what they think are best practices in the U.S., even if they’re not sure why they’re doing it.”

One of the questions that we wanted to look at was to what extent is a performance appraisal a contract, and to what extent is it a relationship where it is used to encourage you? We also wanted to see some basic things. There are some people who claim that it really doesn’t drive very much about your outcomes. Merit pay is based on something else. It’s about bias or how the company is doing, and that if you get cozy with your supervisor, you get good appraisals. If you don’t, then you get bad ones. But here’s maybe the biggest thing, which we weren’t so interested in academically when we started this. But practically, it’s really important. Do people who perform well always perform well? And people who perform poorly, do they always perform poorly? The reason this matters is because there is a very prominent theory in the practice of management — something that Jack Welch made famous — about the A-player, B-player, C-player model. The folks at McKinsey & Co. were making a similar case that there are really good executives and there are kind of lousy ones. The big thing you want to do if you believe that is to hire the good ones and get rid of the bad ones. If that’s the story, then management’s kind of simple, right? You just hire the good people, screen them and see how they do. If they do bad, out they go.

As far as we can tell, no one has ever looked at this before or at least published it. Are the people who do well always doing well, or not? If we know your scores this year for everybody in the company, how much of next year’s score could we predict or explain? If the good people are always good and the bad people are always bad, we can explain 100% of your scores because next year’s score will be identical to this year’s score. If it’s random, which would be kind of astonishing, then it would be zero. There’d be no relationship between how people on average perform this year and how they perform next year. The good people could be good, the bad people could be good or bad.

Knowledge at Wharton: But you would think that they would follow a pattern. If you’re good in 2014, unless something has drastically changed, you’re going to be pretty good in 2015 as well.

Cappelli: Right. It’s between zero and 100%. If you think this A-player, B-player, C-player model is right, it’s going to be closer to 100. If you think it’s all just random or it’s kind of noise or people vary a lot, you’re closer to zero. So, that’s the question.

Knowledge at Wharton: I’m going to say that it would probably be closer to 70 or 75.

Cappelli: That’s a very common answer. People in human resources guess 80%. The correct answer is 27%, so it’s way closer to zero than it is to 100%.

Knowledge at Wharton: Why so much lower? I would think that it would be almost an automatic that it would be on the higher end.

Cappelli: Many people seem to believe that, especially people in human resources. But when I ask them if they have ever actually looked at it, the answer is no. They just assume it’s that way. Maybe they assume it’s that way because that’s what you hear from the A-player, B-player, C-player kind of story, and you could see some of this is a cognitive bias. There’s something in psychology known as the fundamental attribution error. It means that when you see somebody behave in a particular way, we are inclined to assume it is because of who they are rather than the circumstances. The classic example is somebody racing by you on the expressway going home. They’re driving on the shoulder and whipping past. Your inclination is to say, “That guy’s a jerk,” rather than to even entertain the idea that maybe it’s an emergency. We seem to be wired to think everything is due to the person. If you believe that, then you would be inclined to think the A-player, B-player, C-player model is right and good players this year are going to be good players next year, etc.

“If you think about employment, you don’t really have a contract with your boss. The boss is telling you to do different things all the time.”

The other thing we looked at was to see whether it actually changed your appraisal scores when you got a new supervisor, because the other view is that you get comfy with a particular supervisor, then your scores are always sort of the same. You get a new supervisor, and they can really sort out whether you’re good or bad. Well, we didn’t see that either.

Knowledge at Wharton: I can see that happening on both sides. Either you have a supervisor that you just don’t get along with right from the start and your performance appraisal would be lower, or it could be the same if you get along with somebody.

Cappelli: We didn’t go in with a prior expecatation, saying, “Gosh, this is silly to think that it’s all disposition.” Or, “Boy, it’s almost all random.” We had no idea what we were going to find. In fact, we were looking at that as a way to test the real things we thought were going to be interesting, which was is it more like a contract or not? It turns out there’s a lot of variation in how people perform. One of the things that calls into question are these forced ranking systems, or they call them “rank and yank” or “rack and whack.” General Electric used to force out the bottom 10% because they believed it was the A-player, B-player, C-player model. If your company’s doing that, you might want to actually look to see whether it’s true that your bottom 10% this year are the same as your bottom 10% next year. The problem is, if you keep firing your bottom 10%, you’re never going to know because you’ll never know what those guys would have done. But you could at least look at the appraisal scores for everybody else and see whether they remain constant over time. If they’re bouncing around a lot, it is insane to fire the bottom 10% because there’s no reason to think those guys are going to be bad next year.

Knowledge at Wharton: Is that 50^th percentile kind of the perfect area?

Cappelli: I don’t know what is good or bad out of this. I think if you believe management matters, you would like to think that it’s not a perfect correlation. You’d like to think that the numbers are a little lower because you could shape it. You could take the same person with a different manager, a different context, and they could perform differently — better or worse. I think it’s encouraging to management as a field that the relationship is lower. But it makes it harder for people running businesses and employers because now it’s not just picking the good people and then getting out of their way.

Knowledge at Wharton: But performance appraisals have seemingly taken on more importance in the last 20, 30 years because the elements of psychology now are factored into business so much. Companies want to know what their employees are thinking, more so than ever before.

Cappelli: Honestly, there was a high watermark of that stuff about 40 years ago or so. For example, AT&T had a team of about 15 psychologists, through 1980, that just tinkered with the performance appraisal form every year. And in the 1960s, performance appraisals were so thorough that you were assessed on the appraisals you gave your subordinates. They would read your appraisals, and if you didn’t do a good job, it affected your appraisal. They’d also see how your subordinates did years later. If they did better in their careers, that affected your own appraisal. They used to take this stuff way more seriously, and we don’t anymore.

Knowledge at Wharton: It has been pared down because of what factors?

Cappelli: There are a couple. The first one is that we’ve given supervisors a ton of other stuff to do. It used to be that your job was to supervise people and that was it. Now, you’re an individual contributor, and you’re supervising these folks. The second thing that’s happened is the span of control has increased. That means a number of people reporting to you. There used to be a rule that six or seven were the most people you ought to supervise. Now, it’s up in the 20s in lots of places. If you’re trying to follow 20 employees and pay attention to what they’re doing and be an individual contributor, it’s almost impossible to pay much attention to them.

“If you’re a subordinate, it’s hard to be objective about [appraising] your boss. It’s hard, always, to like your boss.”

Knowledge at Wharton: Did you look at employees involved in sales whose performance could be measured objectively?

Cappelli: We looked at a retail organization, and we could see the store managers. They had 10 attributes that they were assessed on, and I think six of them were actually hard numbers — financial performance, store sales, things like that. So, they were objective numbers, and the performance bounced around a lot. We couldn’t tell who the manager was and go in and interview that manager. As you’d imagine, there were hundreds of managers, so it would have been pretty hard to do. As we were saying before, the best predictors seem to be things which have more to do with bias than with good management practice. Although you would think that good management practice ought to matter. It’s just a little hard to measure in a study.

Knowledge at Wharton: You mentioned that in certain situations that the boss is asking the employee questions such as, “What kind of job did I do over the course of the year?”

Cappelli: You know, 360-degree feedback is the formal way in which that gets done, where you ask people all the way around you, “How do you think I did as a boss?” That has not had a terrific track record, partly because there’s a lot of venting going on. If you’re a subordinate, it’s hard to be objective about your boss. It’s hard, always, to like your boss.

Let me tell you the punch line of what we found on the academic side: Things don’t look very much like a contract, and supervisors tend to reward people for improvements as well as the level of performance. So, if you’re doing better this year than last year, it’s not like, “You did well. You get this much.” Also, counter to the prevailing view, they over-reward high performers. It’s not a linear relationship. If you’re a poor performer, they really do whack your merit pay increases. And if you’re a better performer, they really do load them up.

Knowledge at Wharton: And that increases that separation between the upper end and the lower end, in the course of the job.

Cappelli: Within the job, that’s right. And it is true that as you move up the organization, the average scores increase. Why is that? There’s two explanations. One is that you’re selecting better people if it’s promotion from within. So, it’s not surprising that the scores would go up. And is it bias once you get near the top? They say that CEOs always give their personal assistants the top score.

Knowledge at Wharton Podcast

An Imperfect Test: The Problem with Job Performance Appraisals

May 23, 2016 • 22 min listen

More From Knowledge at Wharton

What Is the Role of Customers in the Gig Economy?

What Does the Labor Side of Manufacturing Need Over the Next Decade?

Can Intrapreneurship Help Close the Racial Wealth Gap?

Looking for more insights?