Whitewashing The Data

Daniel Lomax
11 min readJun 20, 2020

Yes, Black Lives Matter has science on its side.

Eight days after the grotesque killing of George Floyd, amid an explosion of both violent and peaceful protest across several countries, Wall Street Journal writer Heather Mac Donald became the first high-profile conservative to attempt to tackle the Black Lives Matter movement on the grounds of data. Her piece, named The Myth of Systemic Police Racism, explained to the less-informed that a “solid body of evidence finds no structural bias in the criminal-justice system with regard to arrests, prosecution or sentencing” and that “crime and suspect behavior, not race, determine most police actions”. Neurologist, pioneer of 21st century atheist literature, political liberal and member of what has become known as the “Intellectual Dark Web”, Sam Harris, presented a podcast a week later, in which he drew on scientific literature to explain that there is no racial bias in the decisions of police to shoot to kill, and that these events are at any rate extremely rare. Harris laments that “we are now unable to speak, or even think, about facts”, and concludes that we are

paying an intolerable price for confusion about racism, and social justice generally, and the importance of identity generally, and this is happening in an environment where the path to success and power for historically disadvantaged groups isn’t generally barred by white racists who won’t vote for them, or hire them, or celebrate their achievements, or buy their products, and it isn’t generally barred by laws and policies and norms that are unfair. Now, there is surely still some of that, but there must be less of it now than there ever was.

Harris makes several concessions to the need for social, economic and criminal justice reforms, but maintains that “the current problem of police violence seems a perfect case in point” that allegations of racism are impertinent to calls for progress.

Fast-forward another week, to Juneteenth, and black conservative writer Coleman Hughes penned an essay arguing that the body of evidence does not support the notion of racial bias in homicides by police. Hughes elaborates:

You might agree that the police kill plenty of unarmed white people, but object that they are more likely to kill unarmed black people, relative to their share of the population. That’s where the data comes in. The objection is true as far as it goes; but it’s also misleading. To demonstrate the existence of a racial bias, it’s not enough to cite the fact that black people comprise 14 percent of the population but about 35 percent of unarmed Americans shot dead by police. (By that logic, you could prove that police shootings were extremely sexist by pointing out that men comprise 50 percent of the population but 93 percent of unarmed Americans shot by cops.)

Because there’s been little discussion of data from the political Left, these arguments appear to have gone largely unchecked. I’ll take these in order, saving the best until last. The first thing to note about Heather Mac Donald’s piece is that she doesn’t just deny the existence of a shooting bias, but of structural racism altogether. She notes that “In 2018, the latest year for which such data have been published, African-Americans made up 53% of known homicide offenders in the U.S. and commit about 60% of robberies, though they are 13% of the population.” Taken together, these positions amount to what Natalie Wynn characterises as “black people simply decide to commit crimes for no reason”. I’ll come back to that. She argues that:

“If the Ferguson effect of officers backing off law enforcement in minority neighborhoods is reborn as the Minneapolis effect, the thousands of law-abiding African-Americans who depend on the police for basic safety will once again be the victims.”

Note the systemic cognitive bias here: when blacks commit high crime rates it’s their fault for doing it, even though they can’t rely on a police presence to resolve disputes (something Mac Donald herself covered in an earlier article, Neighborhood Knucklehead, quoted at length in Steven Pinker’s Better Angels Of Our Nature). But when police stop responding to black neighbourhoods that’s black people’s fault too, because they made police afraid to do the jobs they signed up for.

Mac Donald cites two studies in total: one by Roland G Fryer and one by Johnson et al. Fryer for a start only looks at once city (Houston) for ten years (2005–2015) meaning that he’s working with a very small data set relative to the overall. His piece only assesses what’s known in economics as “statistical discrimination” (which is not the same as racial bias). Justin Feldman explains:

“[…] economic theory holds that police want to maximize the number of arrests for the possession of contraband (such as drugs or weapons) while expending the fewest resources. If they are acting in the most cost-efficient, rational manner, the officers may use racial stereotypes to increase the arrest rate per stop. This theory completely falls apart for police shootings, however, because officers are not trying to rationally maximize the number of shootings. The theory that is supposed to be informing Fryer’s choice of methods is therefore not applicable to this case. He seems somewhat aware of this issue.”

Feldman goes on to note:

[…] there is an even more fundamental problem with the Houston police shooting analysis. In a typical study, a researcher will start with a previously defined population where each individual is at risk of a particular outcome. For instance, a population of drivers stopped by police can have one of two outcomes: they can be arrested, or they can be sent on their way. Instead of following this standard approach, Fryer constructs a fictitious population of people who are shot by police and people who are arrested. The problem here is that these two groups (those shot and those arrested) are, in all likelihood, systematically different from one another in ways that cannot be controlled for statistically.

Also note another of Fryer’s concessions:

A final disadvantage, potentially most important for inference, is that all observations in the OIS data are shootings. In statistical parlance, they don’t contain the ‘zeros’ (e.g., set of police interactions in which lethal force was justified but not used). To the extent that racial bias is prevalent on the extensive margin — whether or not someone is ever in an officer-involved shooting — these data would not capture it.

This is a problem of dummy variables: in other to show trends, and particularly to parse them from regression analyses, you need to figure out what value substitutes in for the “zeros” so as not to distort the trend. Fryer is conscious that he can’t do this because he doesn’t have the necessary data.

The paper also notes that “blacks are almost 4 times more likely to be stopped by police relative to their population proportion” — note that blacks don’t commit 80% of the crime even on police estimates. In light of this it seems a somewhat odd choice of study for somebody to cite if they are trying to convince Wall Street Journal readers that structural racism is a “myth”.

Image displayed during the Black Lives Matter protest in Coventry, UK on 07/06/2020.

Mac Donald’s second source is from the Proceedings of the National Academy of Sciences. This study is explicit about not measuring the probability of being shot given the ethnicity of those involved [Pr(shot|race)] but rather your probability of being some given ethnicity if you’ve shot or been shot [Pr(race|shot)]. To confuse the two is what logicians call “affirming the consequent”, and Johnson was taken up on this in a reply by Knox & Mummolo, in a more recent paper (nonetheless released well in time for Mac Donald to have mentioned it).

Now in Johnson’s defence he did publish a part-clarification, part-retraction of his piece. He admits the error but notes that you can convert between the two values using Bayes’ rule:

P(x|y) = P(y|x).P(x)/P(y)

provided you reliably know P(x) and P(y). It’s worth noting firstly that this produces a radically different number and secondly that in this case both values are drawn from police data.

Both Fryer and the PNAS study’s authors lament, on that note, having had to rely on databases compiled by the police because the police are not a reliable source when it comes to their own malpractice. This is a problem of both “reliability” and “usefulness” — if we’re measuring for racial bias specifically based on data gathered by the people we’re measuring racial bias in, we’re at risk of circularity. Mac Donald fails to note this despite the fact that, to take the unmissably high-profile example, George Floyd’s death was reported by the police as nonviolent.

The PNAS study also only uses a handful of cities/counties in a country of thousands. It’s not impossible to get an almost-geographically-complete dataset — John Lott managed it for almost every county in America for his analysis of whether gun control leads to less or more violent crime.

There are multiple and larger studies of the same phenomena which do find a — sometimes significant — racial bias. (For example: this one, this one, this one and this one). Of course the way to do science isn’t just to cherry-pick two studies for two arbitrarily defined cities/periods and tout them as proof, but to look at the preponderance of evidence. One way to do this is with literature reviews, which can map the comparative weight of studies. In fact a literature review has been done and found an anti-black racial bias.

The two studies Sam Harris cites happen to be the same two studies Heather Mac Donald included in her article, so I won’t repeat the same counter-arguments again. (That’s not because they’re the only two studies on the subject.) I will say for Harris that unlike Mac Donald he seems to have followed up by actually reading the studies he cites, even if he makes some of the same errors of interpretation. His recording shows a greater grasp of nuance and an understanding that police shootings are not the only possible indicator of structural racism. For instance he makes remarks that are clearly in line with Fryer’s conclusion:

“Much more troubling, due to their frequency and potential impact on minority belief formation, is the possibility that racial differences in police use of non-lethal force has spillovers on myriad dimensions of racial inequality. If, for instance, blacks use their lived experience with police as evidence that the world is discriminatory, then it is easy to understand why black youth invest less in human capital or black adults are more likely to believe discrimination is an important determinant of economic outcomes. Black Dignity Matters.”

Coleman Hughes writes:

“[…] you must do what all good social scientists do: control for confounding variables to isolate the effect that one variable has upon another (in this case, the effect of a suspect’s race on a cop’s decision to pull the trigger). At least four careful studies have done this — one by Harvard economist Roland Fryer, one by a group of public-health researchers, one by economist Sendhil Mullainathan, and one by David Johnson, et al.”

Which is to say, the same two mentioned by Mac Donald and Harris, plus two more. The one by the economist isn’t actually a study (it’s a short New York Times piece) so that makes “at least three careful studies”, but I’ll deal with it anyway. Sendhil Mullainathan lays it out as follows:

For the entire country, 28.9 percent of arrestees were African-American. This number is not very different from the 31.8 percent of police-shooting victims who were African-Americans. If police discrimination were a big factor in the actual killings, we would have expected a larger gap between the arrest rate and the police-killing rate.

The first thing I should point out is that the gap here is potentially bigger than it seems. Suppose the police arrested/shot 10000 people: 28.9% (2,890) of them black and 71.1% of them non-black. Now suppose (to make it simple) that they shoot 318 blacks and 682 non-blacks — which will mean that 31.8% of those shot were black (as stated). This means that 11.00% of the black arrestees were shot and 9.59% of the non-black people were shot. So for every 9.59 non-blacks shot, 11 blacks are shot. The math here is simple — 11/9.59 = 1.1470 (4sf). This actually means that black arrestees are about 15% more likely to be shot.

But to what extent is it meaningful to compare the two percentages Mullainathan gives us? Arrests are not a proxy for encounters, so it doesn’t tell us anything. If police are responding differently to encounters with people of different ethnicities then we can’t control for one behaviour in terms of the other — this is one of the criticisms Feldman makes of Fryer, above.

I did say I’d save the best until last, and the study by the public health researchers seems pretty solid. It doesn’t rely on the police force’s own database of recorded fatalities; it doesn’t start out purely from fatalities and then enumerate people by ethnicity; there are no embedded observations that would prove the opposite of what the citing author (in this case, Coleman) is trying to prove. It does note the imperfections of the data set but all studies must work with this. Coleman Hughes doesn’t try to claim more for it than he reasonably can: he qualifies his citation by saying “Of course, that hardly settles the issue for all time; as always, more research is needed.” He does however go on to say:

But given the studies already done, it seems unlikely that future work will uncover anything close to the amount of racial bias that BLM protesters in America and around the world believe exists.

How unlikely is it? The fact remains that there are a wealth of studies on this subject and many of them do in fact find an anti-black shooting bias in police activity. (For example: this one and this one. The first of these is large any thorough — again, like all studies, imperfect.) There is psychology literature suggesting that officers are more likely to open fire on black suspects.

While Mac Donald explains away structural racism in its entirety, Harris and Hughes get something right: there needs to be more emphasis on indicators of structural bias that go beyond the extreme case of lethal police brutality. If you want to know how ethnic minorities are viewed in police culture, check out the website that was developed specifically for them to talk among themselves — you’ll find images of black people depicted as apes; violent rage; explicit racism and more. Harris acknowledges that black citizens are more likely to be handled roughly. There are broader social issues such as the legacy of redlining, Rachmanism, blockbusting, restrictive covenants and predatory lending. There are policies designed with both the intent and effect of imprisoning black people. There’s mass incarceration — 1% of America’s African-American population is now behind bars — racial gerrymandering, and more. It won’t do to tackle one problem in isolation. We have to look at the interconnecting obstacles ethnic minorities face, and the ways in which one thing, as always, leads to another.

Since Harris insists we need a dialogue and not a monologue I should point out that Fryer’s work got him an op-ed in the New York Times, whereas none of the aforementioned criminologists who responded to it got the same opportunity. Similarly, Harris mentions his work and not theirs. If it’s finally time for America to have a conversation about this, it would be good to see a real dialogue. We can hope that a supporter of the Black Lives Matter movement will be invited to make a case for all of this in the not-too-distant future.

--

--