Hacker News new | ask | show | jobs
by datahipster 4187 days ago
> Spatial statistics aren't the same as regular statistics

I've always been frustrated with the gap between statistics and spatial statistics. For example, some of the methodologies with conducting hot-spot analysis is somewhat misleading, especially to uninformed geospatial analysts. For example, Esri [0] implements this first by conducting geospatial aggregation, then calculating z-scores based on Gaussian assumptions, then generates a corresponding "p-value" to extract "statistically significant areas" that are coined "hot spots". At that point, an analyst typically color-codes those p-values showing regions with low p-values as "extreme" areas of interest. I'm really curious if there's any empirical or anecdotal research that validates this methodology.

There are some attempts to try and normalize sampled data. Location Quotient [1] (and Standardized Location Quotient), for example, compares a local measure to a global measure. However, this too has Gaussian assumptions and doesn't properly account for variance in the data.

I would definitely love to see a hierarchical Bayesian spatial model that takes into account a geospatial prior (such as the overall density of tweets) allowing you to solve for the posterior of cluster centers. Has anyone seen this done before?

[0] http://resources.arcgis.com/en/help/main/10.1/index.html#//0...

[1] http://www.bea.gov/faq/index.cfm?faq_id=478