Hacker News new | ask | show | jobs
by gaius 5424 days ago
Who has a better methodology? Is there even a good one?
2 comments

If that was a serious question, places that do developer tools and programming languages for a living (at least MSFT, I assume others as well) pay ridiculous amounts of money to independent third-party companies that specialize in gathering this kind of data. The raw data was then kept fairly private (marketing + upper mgmt only), but the rank and file would see some of it occasionally when things such as trends on the number of VBA or VB5 or VC++ programmers appeared in slide decks talking about the direction for upcoming versions of the product.

Having working with the raw data, it was pretty fantastic. Segemented by industry/business size, handled issues with multiple programming languages or companies where one section used one language and another used other ones, etc. We even knew which tools and add-ons were used for which languages and which compiler on each platform (i.e. how many commercial shops using C++ targeting linux are using gcc vs. icc?).

But that data was also stunningly expensive. My marketing friends tell me that accurate market data always is.

Interesting. It's a tautology but the Internet sees only the Internet - there's a huge swathe of programming work that just isn't advertised online, so is invisible to TIOBE.
http://www.langpop.com is better because

1) I use more metrics and

2) I don't go around making statements about how, from one month to the next, someone displaced someone else, or climbed into a certain ranking, or things like that.

3) I let people reweight the chart based on the metrics they like.

I think the numbers LangPop comes up with are pretty good, but by their very nature are a bit fuzzy.

F# isn't listed on any of your charts. I think it's probably popular enough to show up in all of your data sources at this point.

Also, maybe add GitHub and StackOverflow as data sources?

F# should probably go there, yeah.

GitHub and StackOverflow started out really biased in terms of their communities - GitHub with Ruby, and StackOverflow with Microsoft languages. Do you think they've sufficiently lost that bias?

On another note, you know what else could be a great source(s) for data? Google Scholar, CiteSeerX and arXiv. It'd be really interesting to compare the language usage between "industry" and academia.
I think StackOverflow has; GitHub still seems a bit biased towards scripting languages like Ruby or Javascript though. In any case, since you show the graphs from the various sources, I don't think it matters if one source is more biased than another. If, for example, you used CodePlex as a data source, you'll see a huge bias towards C# -- but visitors to your site could simply draw their own conclusions from the charts.
Where can I see the data for GitHub?

I'm not sure I even trust that... I do lots of bindings from OCaml to C, and whereas I consider them to be OCaml projects, GitHub sees they're more C by LOC and counts them as C.

If nearly all ruby programmers put their code on github, and you don't count it but you do count google code, then isn't that a bias i the sample? Wouldn't it make more sense to include all the major code hosting sites, including github?
Google's Code Search just searches for code on the internet.
ah, sorry. I thought you were looking at google code repos.
This is, by the way, an honest question. I think at a certain point they'll be mainstream enough to say yes, but it's tough to say when.