| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Rochus 1457 days ago
	How do you know the sample is representative? If I look at the data posted by the OP there are about 450 clojure jobs a year; if we assume that developers switch jobs all three years in median we get about 1350 clojure jobs in total; if we assume a very good response rate for the survey, about 3% of these developers or about 40 responded; this is even less than the number of states in the US.

2 comments

nequo 1457 days ago

Whether a sample is representative is a different issue from whether the response rate is high.

For example, you can have a population of 10,000 jobs, 9,000 of which is hiring for Clojure and 1,000 of which is hiring for Forth. If you sample the 9,000 Clojure jobs, then you might conclude that 100% of all 10,000 jobs are for Clojure. But in reality, only 90% are.

Instead, you can sample 100 of the 10,000 jobs at random. The expected value of the average of whether a sampled job is a Clojure job will be 90%. There will be noise but that can be statistically accounted for.

If the population that you want to draw conclusions about is, say, the complete universe of jobs ever offered in the US in 2021, it will be difficult to find either a data set that contains this universe or a data set that is arguably a random subset of the universe. So representativeness is hard.

You could adjust your population definition to achieve plausible representativeness. For example, take the population of all developer jobs at companies that had an IPO between 2018 and 2021. Maybe you have a way to compile this data set from some source. Then you limit the scope of your claims but you will be more credible.

Another thing that you can do is take an existing data set that you know to be representative and compare the distribution of job characteristics in your sample to that. For example, you might find that your sample is more likely to include web development jobs than your reference data set. Then you know that your sample is not representative, and you know in what way it isn't. Or you might find that your sample is comparable to your reference data set. This can give you some confidence that your findings generalize.

link

Rochus 1457 days ago

Agree; even if we don't know the distribution or representativeness of the samples we can do some guesstimates, as I just did; it's as good or as bad as what you see on Tiobe or other ranking services; the error is likely to be larger the smaller the population under consideration, which is why I'm rather skeptical about the salary statistics on Clojure.

link

fulafel 1457 days ago

Why would you think that this data captures the nr of Clojure jobs?

I don't know many Clojure devs who found their job looking for a Clojure dev job ad. Even the yearly state of Clojure survey has ~2400 responses.

link

Rochus 1457 days ago

I made two simplifications in my thinking. First, I assume that the number of Clojure jobs is constant over the average job retention period. Second, I assume that each job change is advertised exactly once, or the OP evaluation ignores redundant advertisements for the same job. If the latter assumption were not true, the number of actual jobs would be smaller by the redundancy factor. There are studies on the job retention period (e.g. https://hackerlife.co/blog/san-francisco-large-corporation-e...) from which I just took the longest one for a conservative estimate.

The number of responses to the Clojure survey is barely representative for the size of the job market; otherwise you would have to expect millions of respondents for e.g. the JavaScript survey; people responding to such surveys are likely fans of the technology, not necessarily professionals who were hired to use the technology; of course there are also professionals among the respondents, but we don't know the exact proportion.

link