Hacker News new | ask | show | jobs
by dkarl 5121 days ago
an economy where we do not need statistics since we have all the data!

I simultaneously feel his elation and am mystified. If you have such a quantity of data that you must relate to it via statistics, how is that better than only having the statistics? Is it because you can define and calculate your own creative statistics, or...?

4 comments

Off the top of my head: you no longer need to sample to obtain your statistics, and you can obtain good statistics on virtually everything and every sub-component of the economy, sliced every which way, as many times as you like -- which is much harder in the real world. You can't just go out and gather information on sales of all red, oblong widgets in shops owned by couples aged 50-60 with one dog and one cat -- but you can in a game economy.

Basically it's a huge data laboratory where you are able to record every transaction in the world, and work from there. That's massively useful to economists, because they can now test a far wider range of hypotheses and, furthermore, eliminate them much more quickly.

Even the real data should contain noise, compared to a true hypothesis. For example, you could predict that a certain (for the purpose of the example, small) percentage of all players will buy a certain ship. But many of them just stop playing by chance, then your prediction fails even if the underlying theory is correct. [1] I believe the main advantage is, that you can get rid of sampling bias. So you can look at the entire economy, not only on the part which is willing to answer a poll. For a nice real world example see [2].

[1] https://en.wikipedia.org/wiki/False_negatives#Type_I_error

[2] http://neuroanthropology.net/2010/07/10/we-agree-its-weird-b...

> "But many of them just stop playing by chance"

I suspect this will be one of the significant weaknesses in trying to translate anything learned in a game economy into real economies. You have much better data in a game economy, but people do grow bored or decide they're just not good enough, and move on. Or people decide they've taken a stupid path and reset their account, and build from scratch. In real life, most people try to keep living even if they're at the bottom end of the economy and don't have the means to change that. This strikes me as a fundamental difference between the virtual and the real.

Great, I learn by being misunderstood :) I did think about people who stop playing because of outside events, like a new season of "Game of thrones" or a girl, not about ingame events.

However, there is of course a fundamental bias in game economies. Another example would be, that in game economies people use unethical tactics, like simply killing competition. ( I did not hear of Bing-hordes raiding the Google HQ.)

This type of 'noise' is easy to weed out. Its not really even noise--its just a question of how he wants to define his population.

With a good team of people who know SQL, he can pull data from any population he chooses, and can make any statement about that population he has data on, and if his arithmetic is valid, those facts are without any sort of 'noise'.

On the other hand, this is economics, and if he ever wants to make a valid prediction, he's going to have to use forecasting. Forecasting intentionally adds noise, so in that sense, have the population doesn't eliminate the existence of noise in his work.

But I really don't see what you mean by 'noise' in this case. He literally has all of the facts. There is nothing fuzzy about the data, as long as he has the data.

The discipline of statistics can be defined as the set of techniques used to draw inferences and conclusions from a sample--an incomplete set of data. When you have all of the data, playing with it ceases to be statistics and instead becomes data analysis.
>how is that better than only having the statistics?

Its the quality of the stats that is the difference. e.g. When economists work out the GDP its a case of thumb-sucking a number based on a few factors which in turn are also educated guesses. With a digital economy you can just throw it in an Excel sheet and get a perfectly accurate answer. Thats what he means with "having the data".

A statistic estimates a parameter of a population. If you have the population, you don't need statistics because you already know the parameters.

Having 'all the data' sidesteps any possible need to use statistics outside of forecasting, which he doesn't seem too fond of anyways.