Hacker News new | ask | show | jobs
by evanelias 660 days ago
> The back-end database of Tumblr is reportedly very simple

I'm not sure what your source is for this, but it's not correct. The combination of scale, age, and number of product features makes it quite challenging.

My knowledge of Tumblr's db infra is about 6 years out of date, but by my math they hit the milestone of 1 trillion distinct relational rows (on primary databases alone, i.e. excluding copies on replicas) a few years back already.

During Tumblr's peak popularity (~2012 to early 2013), the daily posting rate at times exceeded 100 million posts/day. For sake of comparison, Twitter reportedly received about 5x that at the time, but Tumblr posts are larger and far more media-heavy on average. So it's accurate to say the total volume was almost comparable.

That all said, the scope of this migration pre-announcement isn't totally clear. My assumption is they just plan to move the public blog network front-end to WordPress, possibly using some sort of shim layer. But an important point here is the blog network is a minority of Tumblr's traffic, and always has been. Most HN users who have never actually created a Tumblr account fail to understand this: the popular part of Tumblr is the social network / logged-in dashboard experience, not the public-facing blog network.

If they plan to move the entire site/backend over to WP, that's a much more challenging migration. The ID mappings and differing sharding schemes alone make this an absolutely massive effort.

source: long ago I was personally responsible for Tumblr's database and cache scalability during its original period of hockey-stick growth.

1 comments

Here's the claim that I read https://news.ycombinator.com/item?id=19418165

Changing all the database structures to match WP when they will not actually be running WP would not make much sense. I think they will leave all the tables exactly the same and just move them onto back-end hardware that is shared with WP. i.e. a separate database running on the same server farm. If would be nice if the press release were a bit less vague on this.

That commenter was referring to media files (images and video), not relational database infrastructure. Completely separate parts of the stack. Media files weren't my area at all, but AFAIK parts of that comment weren't factually accurate at the time it was made, and furthermore iirc that commenter was someone who was previously disgruntled about being laid off in one of the many Verizon-led rounds of staff reductions. (Being disgruntled about a layoff is totally understandable, but you then have to take their comments about the company with a huge heaping of salt...)

> Changing all the database structures to match WP when they will not actually be running WP would not make much sense.

No one suggested that exactly, so I'm not sure what you're referring to here.

> i.e. a separate database running on the same server farm

You keep describing Tumblr's database infrastructure as if it's a single server. It's not. It's multiple discrete tiers of sharded databases, serving different purposes. You don't put a trillion rows in a single database. They have hundreds of database servers, hundreds of cache servers, hundreds to a few thousand application/web servers, etc.

Again, Tumblr is not just a blogging/CMS platform. It's a full social network, with the primary user experience being a dashboard activity stream, similar to the logged-in experience of Twitter, Instagram, and Facebook.

The proposed migration here is about migrating some portion of Tumblr's codebase and data to run off of the wordpress.com infrastructure, likely to reduce operating costs long-term. It's not clear whether that will consist of migrating the entire thing, or if it's just about shimming the public blog network (i.e. not the social network part). But in any case this isn't about just moving hardware.