How Data Science Can Win the Debate on Police Reform

The recent deaths of George Floyd, Breonna Taylor, Philando Castile and a number of other Black citizens who have lost their lives during an interaction with law enforcement officers have put the topic of police reform at the top of the national agenda. The phrase “defund the police” became a rallying cry across many American cities over the summer as protesters marched for social justice. But police reform isn’t as simple as a catchphrase. Two scholars who have been studying the issue for years say solving the problem starts with better data. Dean Knox is a Wharton professor of operations, information and decisions, and Jonathan Mummolo is a professor of politics and public affairs at Princeton. They’ve published several papers on racial bias in policing and, with the help of Analytics at Wharton, co-founded Research on Policing Reform and Accountability, an organization dedicated to bringing academic rigor and science to what is often a very emotional debate. Knox and Mummolo recently spoke with Knowledge at Wharton about their research and what decision-makers can do to effect change.

Listen to the podcast at the top of this page or read an edited transcript of the conversation below.

Knowledge at Wharton: There is disagreement on even the most fundamental aspects of racial bias in policing. Why?

Dean Knox: These cases — Breonna Taylor, George Floyd, so many others — grab our attention because the facts are just so outrageous. That’s important because it brings attention to this important issue. But for those of us who are seeking to reform policing, we need to keep in mind that these are instances of what is unfortunately a very common problem. Last year alone, police were responsible for almost 2,000 known deaths in America. It’s a leading cause of death among young men, right after heart disease and cancer. That burden of police killings falls on Black men, especially. [Rutgers University professor] Frank Edwards and co-authors have shown that this group is two and a half times more likely to be killed by police than white men. For those of us who are studying this whole system, we want to know how we got to this point. The answer turns out to be complicated because bias manifests at so many levels in the system.

Are officers discriminating in the moment when they pull the trigger or use force, either unconsciously or consciously? Or is the bias coming earlier, when officers are discriminating in who they detain — like stopping minorities for jaywalking or suspicious behavior, whatever that is? We know that if you don’t stop somebody, then the encounter doesn’t escalate to the point of violence, so both of these levels matter. Maybe the bias comes even earlier, in where cities and police departments choose to send officers, over-policing minority neighborhoods so that people have more encounters with police officers in minority neighborhoods than white neighborhoods. The problem is we know far less than we need to about these questions. What we do notice is that we’re massively underestimating the problem because of data limitations and the poor quality of existing statistical analyses.

Knowledge at Wharton: What are the constraints of data in the research, and how do you go about solving that problem? How do you quantify racism?

Jonathan Mummolo: Our first step is that we conceptualize racial bias in policing as a causal question. That is, we seek to test whether, on average, civilians of one racial group who get treated a certain way by police would be treated differently if we substituted otherwise comparable civilians of another racial group into the same encounter — sort of holding all else equal. That’s obviously a very difficult statistical problem. But part of the problem with solving it is that, until now, much of the literature has not even defined this question in concrete statistical terms. It doesn’t define what exact statistical quantity we’re testing for or what’s the unit of analysis. Is it the civilian, is it the officer, is it the encounter? And what are the assumptions we’d need to make in order to move from inferring a correlation to inferring a causal link? Much of our work has focused on clarifying these building blocks, because until you know what they are, you’re just flying blind in terms of the statistics.

“We’re massively underestimating the problem because of data limitations and the poor quality of existing statistical analyses.” –Dean Knox

Once you have those things defined, you need to locate detailed data on police-civilian interactions that allow you to account for objective circumstances in which police encounter civilians. It’s things like time of day and what’s going on when police first see a civilian, how they’re behaving. These are things that may differ across racial groups in ways that could affect how police end up treating civilians. Knowing these details can go a long way toward making apples-to-apples comparisons that allow us to isolate the role of race, or at least come close to doing that.

The problem is that many places don’t track that sort of detailed data. Even in places that do, there’s this other problem that Dean alluded to, which is that police only record activity on enforcement events like stops or arrests. In our work, we show that if there’s racial bias in these initial detainment decisions, then simply comparing some outcome, like the use of force, using data on these detainments will underestimate discrimination, because we’ll be missing that entire source of bias that went into stopping someone in the first place.

We have developed ways to correct for that fact to show the range of possible discrimination in things like the use of force, accounting for bias in initial decisions like stopping. I would say an overarching theme of our work is that police-civilian encounters are a complex, multi-stage process, and race can play a role in each part. Without accounting for all those stages, it’s very easy to get incorrect or even completely misleading estimates of racial bias in police behavior.

Knox: I think that’s exactly the problem that we’re facing and exactly why statistical inference is so hard in this setting. It’s also because people have only looked at isolated aspects of this complicated problem. It’s why we see researchers use the exact same data set, analyze it with different methods and come to sometimes entirely opposite conclusions, which has really muddied the water on this important policy debate. Our goal is just to try to bring some clarity to all of this so that policymakers can make informed decisions.

Knowledge at Wharton: You’re saying that better data can help with police reform, but data analysis comes after the fact. What about the upfront work in hiring and training that would help root out bias in police departments? How can your research be useful?

Knox: We don’t see these things as contradictory. I think retrospective data analysis can help inform future practices. In training, there’s a lot that we don’t know. This is something that we’re working on in a new collaboration with the Illinois Holocaust Museum, which provides training to the Chicago Police Department working to improve how CPD trainees learn about civil rights, ethics, bias, things like that. We’re trying to evaluate the impact of their efforts, but that project is just getting started. In terms of tactics, Jonathan has done a lot of work on this. We have reason to believe that we can dial back or eliminate many controversial aggressive tactics because their purported benefits just aren’t supported by the data. I’m thinking of things like police militarization and no-knock SWAT raids that the data shows seem to damage police from a public perception angle, but they don’t actually make officers safer, which was the promise going in.

On the hiring side, we have a new preprint about the role of diversification, which is one of the oldest proposed hiring reforms in policing. We’re using the most fine-grained policing data that’s ever been collected, again looking at Chicago. What we can say is that diversity does seem to help. If you’re a unit commander and you’re deciding whether to send a Black officer or a white officer into the field to patrol a particular beat, what we’ve shown is that you can expect substantially less stops and arrests from the Black officer, mostly in terms of discretionary enforcement. There’s a smaller number of drug arrests or stops for suspicious behavior, whatever that means.

“An overarching theme of our work is that police-civilian encounters are a complex, multi-stage process, and race can play a role in each part.” –Jonathan Mummolo

But if you look at these officer groups, the way that they police serious crimes and things like violent crime arrests, we don’t see much of a difference. That’s really telling us that the gap in the treatment of civilians is coming on the discretionary side. Similarly, we’ve found that female officers use far less force. It’s not a randomized control trial that lets us directly infer whether this is going to have an impact in future hiring, but it is the first step. It’s the first thing that we’d want to know if we’re trying to evaluate the promise of diversification, which is being widely considered right now. The reason we’re able to draw these conclusions is for the first time, with the data we’ve collected, we’re able to compare officers in exactly the same district, exactly the same beat and the same collection of city blocks, same time of day, same shift. Even though we don’t know exactly how the civilians that they observe are behaving, we do know that the officer that we compare are facing that similar pool of civilian behavior. So, the best evidence we have right now says diversification can help, with the caveat that we need data for many more places beyond just Chicago.

Mummolo: Another promising avenue that we’ve seen in our work is that supervisor oversight can play an immediate role in changing police behavior. I’ve done a study in New York looking at stop-and-frisk during the mid-2000s, which was out of control in its volume during that period. Upwards of 90% of people being stopped on the street were found to be guilty of no crime, and the vast majority of these stops were of young Black and Latinx men. In this study, I exploit the fact that there was a sudden change to how oversight was conducted, where police had to write down in paragraph form exactly what led them to stop each individual and then at the end of their shift show that justification to their supervisors. After years of 97% of stops not producing evidence of the suspected crime, we see basically a doubling of the rate at which stops are ex-post justified by evidence. In other words, they’re making more of the right kind of stops and fewer of the wrong kinds of stops, simply as a result of having to justify their actions to their superiors.

Knowledge at Wharton: What does that change indicate to you as a researcher?

Mummolo: It draws on a lot of lessons we’ve had for a long time about how bureaucratic politics works, which is that we have this problem between managers wanting one thing and employees perhaps behaving in ways that are different. In policing agencies, we tend to call that a principal agent problem. This problem is very pronounced because officers work out of sight, out of the reach of their commanders. One of the ways that it’s been theorized that you can help to solve these sorts of problems is to increase oversight, increase monitoring and increase the threat of sanction if employees don’t behave in the ways that you want.

There are two problems here. One is that supervisors have to want the right things. They have to be telling their officers that you shouldn’t be stopping people unnecessarily, which was not the message they were giving in New York for a long time. Second, they have to check up on these things so that officers realize that if they don’t behave as intended, there might be some consequence. I think what the study shows is that officers in many ways are like a lot of other bureaucrats that we’ve studied for a long time. They respond to incentives, they want a paycheck, they want to please their boss or at least not get in trouble. We can use some of those same managerial tools to change behavior.

“We have reason to believe that we can dial back or eliminate many controversial aggressive tactics because their purported benefits just aren’t supported by the data.” –Dean Knox

Knowledge at Wharton: With the help of Analytics at Wharton, you have co-founded an organization called Research on Policing Reform and Accountability. One of the goals is to push back against what you describe as bad science. What does that mean?

Mummolo: We take a quantitative approach to the study of policing that focuses on the careful use of statistical methods to quantify things like racial bias. Unfortunately, the literature on policing and racial bias has a lot of variance in terms of quality. Some work is very well done but often it makes fundamental errors that violate axioms of data analysis. Occasionally, these errors slip through the peer review process and gain notoriety. When that happens, we seek to comment on this work, loudly if necessary, to clarify to the public and policymakers that the evidence being presented does not match the claims being made.

I don’t mean to say we critique work just because it arrives at a particular conclusion. We are open to following the evidence wherever it leads. But we need to call out errors in analysis when they arise, or we risk having these important debates tainted by faulty evidence and also risk the credibility of science being threatened so that it won’t be taken seriously when it needs to be.

As one example, we spent a substantial amount of time over the past year critiquing a study that was published in The Proceedings of the National Academy of Sciences, one of the most prominent journals in the world. They claimed there was no racial bias in police shootings, and that study was recently retracted after a year of our critiques. The study was widely cited in the media and in Congressional hearings, but when you stripped away all the statistical jargon, it became clear that all it was really showing was that most people who get shot by police are white.

We live in a majority white country, so that is not an informative statement about racial bias. What we want to know is, of the times white and Black civilians encounter police, how often is each group shot? Then we’d want to adjust for relevant differences between those encounters to try to isolate the role of race. But for that, you need data on all sorts of police encounters, not just records of fatal shootings. And that’s all this study analyzed, so it was a totally misleading study that ended up contaminating a very important conversation. As scholars seeking evidence-based solutions, we view it as our role to point out these mistakes and try to explain them in ways that are accessible both to experts and the lay public, so they can understand the issues at play. We also publish papers on how to define racial bias statistically and how to use statistics properly to test for it, so researchers can avoid these pitfalls moving forward.

Knox: At the heart of it, some of the problems in this literature are problems of basic logic. The chances of being shot if Black are higher than the chances of being shot if white. But the study in The Proceedings of the National Academy of Sciences isn’t analyzing that. It’s saying, if you’re shot, what are the chances that you’re Black? Again, shot if Black, versus Black if shot. It’s the exact opposite of what we’re looking for. Communicating that point to the general public is a difficult task, given all the statistical jargon around it. But it’s important to make sure that this bad information doesn’t contaminate the debate.

“Just because we see reform show some benefits in one place doesn’t mean it’s going to work in another context.” –Jonathan Mummolo

Knowledge at Wharton: It is so hard for the public to wrap their arms around some of the problems you’re talking about, so I imagine it’s also hard for policymakers and police departments. What can police leaders and legislators do right now to tackle reform?

Knox: The first thing that we need is data transparency — things like disclosing how officers are assigned to patrol, what neighborhoods they’re in, how often they’re on the street, who they stop, who they arrest, who they use force against. Sunlight is the best disinfectant, so we need more transparency about civilian complaints and how allegations of officer misconduct are investigated. Work with civilian oversight organizations and neutral third parties to build public trust in the process and allow the general public to have faith that justice is being served when this misconduct comes to light.

There’s a lot that needs to be done, and one of the issues is that racial bias doesn’t manifest in the same way across every agency in America. In one city, it may be discrimination in stop-and-frisk, in which individuals are detained and the discrimination may be against a primarily Black population. In another police department, it could be against Latino drivers who are stopped and targeted for unconstitutional search at a higher rate. Without having the data, it’s difficult to know what the right answer is in any particular place and time.

Mummolo: One thing that can done is just to reduce or eliminate some of the aggressive tactics that have been employed based on faulty premises. It’s the things we talked about, like stop-and-frisk and militarized policing. The use of SWAT teams, which I would classify as militarized policing, was originally conceived to handle violent emergencies like active shooter scenarios. Now, they’re used every day to serve search warrants. In fact, their use in emergencies are quite a small portion of what they do. The justification is that they help lower crime and protect officers, but when you analyze that in data, that doesn’t show up to be true. I think some of these aggressive tactics that people are worried violate their civil rights and are being used in discriminatory ways can be relaxed or eliminated because the premises on which they’re founded are just faulty.

In other cases, I think we’re seeing evidence that oversight initiatives can help. In a lot of these cases the officers certainly play a role in how bias manifests. But a lot of these are just policy decisions where officers are carrying out orders that are going to cause bias in the system. So, new rules, oversight, relaxation of aggressive tactics, and much more widespread and systematic data collection efforts. We have 18,000 police departments in this country. Most of the really fine-grained data we have on policing comes from a relatively small portion of big-city police departments. Just because we see reform show some benefits in one place doesn’t mean it’s going to work in another context. We need a lot more data from a lot more places to figure out what works.