Hacker News new | ask | show | jobs
by synference 3682 days ago
I'm one of the authors of the article. Quick answer: The data shows that if you're a journalist about to write your next article, you're likely to get more views if you write it about Clinton rather than Trump. It's true that Trump got more pageviews overall, but that seems to be mostly because way more articles were written about him in the first place.

Long Answer: In the article we suggest that if publishers would have written more articles about Clinton, they would have received more page views, because in the data we observe posts on Clinton receive more page views on average. Similarly, we suggest writing more articles on Bernie Sanders would have caused an increase in referrals from social and search. As with all non-experimental approaches to causal inference, valid conclusions require strong assumptions. In the case of this analysis, we assume that the average number of page views that articles on a candidate receives is independent of the number of articles written on that candidate. If it were the case that writing more articles on Clinton, and fewer on Trump, would have caused Clinton articles to receive fewer views, and Trump articles to receive more, then our conclusions might be wrong.

1 comments

I feel like you're working from a broken model.

Trump is the head, and the other candidates are the long tail.

You said: "It's true that Trump got more pageviews overall, but that seems to be mostly because way more articles were written about him in the first place."

And: "we suggest that if publishers would have written more articles about Clinton, they would have received more page views, because in the data we observe posts on Clinton receive more page views on average"

This seems to be the wrong conclusion because of diminishing returns. Writing more articles about Clinton should still push down the average page views. There is only so much interest, and only so much new to write about every day. None of the candidates can create fresh new controversies to feed the media the way Trump can. The question is how much would that push it down? I don't think it would be inaccurate to suggest, based on that these sites exist in a market, that it would push it down significantly below Trump.

I don't believe a base that strong exists per article, where any article is guaranteed to get some absolute number of page views. If diminishing returns aren't present or are extremely weak, then I'm wrong.

If anything, the data doesn't rule out that sites/reporters are correctly maximizing Trump coverage. Or they may not be maximizing enough since absolute demand is so high and Trump generates so much fresh content. If you can write about one easy topic, and maintain an average that high with only a small decrease in the average, you are doing more with less.

In the comment above, I try to clearly lay out that my conclusion rests on the assumption that for a given candidate, avg views per article and number of articles are independent.

I agree that if you reject this assumption, and instead assume that there are 'diminishing returns', then the conclusion I arrived at could be wrong.

There probably is some kind of diminishing return effect, but we don't know how strong it is. It could be weak compared the the effect that 'readers will consume whatever journalists write'. It's pretty interesting the all of the last four leading candidates (Trump, Cruz, Clinton, and Sanders) all had roughly comparable numbers for pageviews per post. That's evidence that readers just pretty much read whatever is published (with the exception of Kasich, a long shot).

It's also true that if you're a journalist right now, faced with the current distribution of articles, you're likely to get more page views by writing your next article on Clinton. This claim doesn't rest on any strong assumptions. That could change if many more articles on Clinton are written, but it's true for now.

If you look at the data in the dashboard, it's also interesting to see that Bernie Sanders gets way more social and search referrals compared to Clinton and Trump.

I think this assumption is dangerous to begin with. It should be proven with data that there are no diminishing returns, and that would be a powerful finding worthy of attention.

And saying that readers will consume whatever journalists write helps power the narrative that the media fueled Trump's campaign. They could write about a different candidate and get slightly improved pageviews, but they're choosing to flood with Trump articles. Your data would only conclude that pageviews aren't driving it.

I think expanding this to "readers will consume whatever journalists write" is a different argument and you would need to establish your "experiment" with a different methodology than the approach used here. The causation seems to be "reporters write news, it exists to be consumed on a site" therefore "readers read it" and that feels like it's missing something to me.

Also, it could be interesting they have comparable numbers per post, but it also backs up the idea that articles exist in response to the demand-supply feedback loop. If sites respond to pageviews, then candidates with lower average pageviews will simply not get as much media attention.

You bring up a good point. Journalists should explore to what extent there are diminishing returns to writing articles on other candidates. Statistics tells us that the most efficient way of exploring this hypothesis is with a multi-armed bandit algorithm. But before I go into that, I think it makes sense to break this problem down into two questions:

(1) Given equally interesting ideas for articles to write on each candidate, which candidate should a journalist write on?

(2) How much investment is required to write an interesting article on each candidate? It might require less work to write something interesting on Trump than on Clinton.

The right way of answering question (1) is with a dynamic multi-armed bandit algorithm. Such an algorithm dynamically explores the problem of diminishing returns. At this point, given the data we have, such an algorithm would suggest you should write on Clinton the vast majority of the time if you're interested in page views per article, and would suggest you write about Sanders if you're interested in bringing in external referrals from facebook and google. If journalists followed the advice of such an algorithm and wrote so many articles on Clinton that readers started to lose interest, then the algorithm would begin to suggest you write on someone else. If there's enough interest in this article, I might write up a follow-up where I fit a model that tells journalists what topic to write on, given that they it's just as easy to write an article on each topic. I could update this model every once in a while to make sure it detects those diminishing returns in time.

Question (2) is more difficult to answer and requires more domain knowledge. I would say it is possible at any moment to write hundreds of interesting articles on each candidate---the real question is how much work it takes. As I mention in the blog post, I am convinced that journalists find it easier to write interesting articles about Trump. So in some sense it's rational for them to do so: the 'return on investment' is higher because it's so cheap to churn out another article on Trump's latest soundbite. However, one could also argue that -- in the name of increased page views, or in the name of a functioning democracy -- they should make the extra effort to write an interesting article on the other candidates.