Hacker News new | ask | show | jobs
by TheDong 1621 days ago
> Why not include the top x cities, instead of just one 200 miles away? Because it rains a lot in both?

You can find the source here: https://doi.org/10.1016/j.ehb.2020.100856

To quote from it:

> Portland, OR, was selected as the comparison site for Seattle, WA, based on Mahalanobis distance matching to evaluate the four largest municipalities in each of Washington and Oregon as potential comparison sites [etc]

They had more of a reason and included the model they used to pick the city.

It's also quite beneficial to keep the areas somewhat near in that there will be less variance in the number of item codes between close locations (i.e. different drinks are sold on the east coast vs west coast since some brands are local)

From a different section:

> Custom-ordered data were provided from store outlets geocoded within the boundaries of the taxing jurisdiction of Seattle, WA, the comparison site, Portland, OR

I suspect they didn't have enough funding to afford more geocoded scanner data given that it sounds like they had to pay for custom data at a rate per-geocoded location... or didn't have the funding to process that much more data.

It wasn't clear how much of the dataset labeling was manual, but it sounded like the study's authors may have had to sift through several thousand barcodes by hand.

1 comments

The comparison site seems suspicious to me. Why not use multiple comparisons? Why wasn't the study pre-registered (was it)?