Hacker News new | ask | show | jobs
by teraflop 4139 days ago
For what it's worth: a scatter plot with lots of huge points, like the one you've drawn for "upvotes vs. comments", is pretty useless for drawing conclusions about the data. It tells you about the support of the joint distribution (the region on which it's non-zero) but very little about its shape.

In particular, that graph could represent a fairly strong correlation (in the R^2 sense), or a fairly weak one, or anything in between. If you want to say something more quantitative about the data, you can do a linear regression and look at the coefficients and residuals.

2 comments

The correlation between individual upvotes and comments isn't really what's the post is about, it's purely an illustration and has no impact on the topic extraction or interpretation. For what it's worth, I did check the correlation coefficient between the two sets (it's 0.81)
When there's too much data for a scatter plot, a heat map will do nicely.
Or pass alpha=0.3 to the plotting function.