Hacker News new | ask | show | jobs
by PeterStuer 436 days ago
The weird thing is their own data does not reflect this at all. The number of articles accessed by users, spiders and bots alike has not moved significantly over the last few years. Why these strange wordings like "65 percent of the resource-consuming traffic"? Is there non-resource consuming traffic? Is this just another fundraising marketing drive? Wikimedia has been know to be less than truthful wrt their funding needs and spent.

https://stats.wikimedia.org/#/all-projects/reading/total-pag...

2 comments

The graph you linked seems to be about article viewing ("page views", like a GET request to https://en.wikipedia.org/wiki/Democracy for example), while the article mentions multimedia content, so fetching the actual bytes of https://en.wikipedia.org/wiki/Democracy#/media/File:Economis... for example, which would consume more content than just loading article pages, as far as I understand.
multimedia content vs articles. It's easy to see how bad scraping of videos and images pushes bandwidth up more than just scraping articles.

The resource consuming traffic is clearly explained in the linked post:

> This means these types of requests are more likely to get forwarded to the core datacenter, which makes it much more expensive in terms of consumption of our resources.

I.e. difference between cached content at cdn edge vs hits to core services.