| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by electrograv 3069 days ago

Weighted score rankings are useless / statistically irrelevant, no matter how much careful hand-tuning we do. The only way to maintain accuracy and scientific responsibility in such things is to:

1) Formally define what is meant by "innovation" in terms of clearly measurable outcomes.

2) Measure this clearly-defined quality among all countries and many sample points through time.

3) Try to separate out explanatory variables for the quality being measured. Build these data driven statistical models to model this formally defined "innovation" quantity -- not using hand-tuned weights of various measures, as these "rankings" or "indexes" often do.

4) Try to predict a probability distribution of the "innovation" quality, using models developed in step 3.

Step 1 should be qualified with explanation that this human-designed definition is an imperfect, and that all results should be understood in the context of this formal definition.

Step 2 should be qualified with notes of any possible limitations in the sampling methodology (availability of data, etc.) and how this factors into error margins.

Step 3 should be qualified with sufficient explanation that it's a model of reality derived from data, and therefore risks overfitting/underfitting/etc. errors.

Step 4 should be qualified with an explanation that this is a prediction based on the above model fit, and therefore is subject to potential errors compounded by any of the previous steps.

That would be the scientifically/statistically responsible and rigorous thing to do. But I suppose I'm crazy to expect Bloomberg to aim for any level of rigor in these "indexes".

1 comments

chrisseaton 3069 days ago

> Formally define what is meant by "innovation" in terms of clearly measurable outcomes.

Surely you can see that this is not possible, and any attempted would be superficial and would be gamed anyway?

electrograv 3069 days ago

“Innovation” is a very slippery term, so I agree defining it would be more difficult than other measures. So for simplicity, let’s use something a bit simpler, like a “quality of living index” or “overall human well-being index”.

It’s not feasible for a human to define such a metric formally upon ‘environmental variables’ (such as education stats, graduation rates, etc.) — quite obviously, as you say — yet trivial to define it as an “outcome measure”, where we directly measure the quantity in question (no matter how difficult or expensive to sample this variable).

To the “quality of living index” example: One could design a polling methodology to fairly reliably gauge people’s overall happiness and satisfaction in a country. This polling would be expensive, so we couldn’t do it super broadly or super frequently — and that’s why we use the subsequent steps described in my parent post (on forming statistical models to separate out connected variables that we can easily and cheaply measure to approximately model the “ground truth” happiness metric).

You can then use this “model fit” to predict this extremely expensive “ground truth” notion of individual happiness in this case, on a much more frequent and granular basis than would ordinarily have been feasible using a ground truth gathering method like a polling process.