| HN Mirror

> Before describing the details of the framework, the high-level intuition is that we are reweighting the Nexar data sample, which is sampled from a non-representative set of locations, so that it matches the locations where different demographic groups actually live. For example, to calculate the police deployment levels that Asian residents of New York City experience, we reweight the original data sample to upsample neighborhoods with larger Asian populations.

> Overall, our estimation procedure compensates for two types of potential bias. Equation 2 compensates for a data bias, reweighting the Nexar dataset (which is sampled from a set of locations which does not necessarily match the population distribution; Figure 1) to match the population distribution of demographic subgroups. This is conceptually similar to inverse propensity weighting procedures [4] which are used to compensate for non-representative data in other settings. Equation 3 compensates for imperfect model performance, and allows us to check that model performance is unbiased (i.e., calibrated) across demographic subgroups.

Section 4.1 goes into the mathematical functions they use to address the data set.

Section 3.2 describes the data set and how it is geographically distributed.

> Data was provided to us by Nexar in two phases. Phase 1 consists of 3,987,835 images sampled prior to September 1 2020, and is extremely geographically and temporally skewed. Geographically, it is concentrated within the boroughs of Manhattan and Brooklyn, and does not contain data from the boroughs of Staten Island, Queens, and the Bronx at all; temporally, it overrepresents data from Thursday nights. Phase 2, which constitutes the majority of the dataset, consists of 20,816,019 images sampled after October 4 2020, and is much more geographically and temporally representative: it is sampled at all times of the day, on all days of the week, and also covers the entire geographic area of New York City.

> Because Phase 2 is much more representative than Phase 1, we conduct our primary analysis of disparities using only data from Phase 2. We additionally conduct numerous validations and bias corrections, described in §4.1, to compensate for non-representative sampling in the dataset. Geographic and temporal coverage during the Phase 2 period is very good. Specifically, 100% of hours during the Phase 2 period are covered; 99.6% of Census Block Groups (CBGs)3 have at least one image, with a mean of 168.2 images per CBG; 88% of roads contained within the borders of New York City are covered by at least one image, using data from OSMNX [6]. Figure 1 summarizes geographic data availability; Figure S1 summarizes temporal data availability.