Hacker News new | ask | show | jobs
by blueberry 5807 days ago
Those who commented in favor of postgres: What are some of the biggest deployments of postgres? Facebook has a gargantic deployment of MySQL and it just works. The author of the answer on Quora mentions reliable scalability as the most important factor for their MySQL choice on another answer here: http://bit.ly/dm6HtQ. What is the biggest system that postgres is deployed in?
3 comments

From http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyt... :

Yahoo builds two petabyte PostgreSQL database James Hamilton writes about Yahoo's "over 2 petabyte repository of user click stream and context data with an update rate for 24 billion events per day".

It apparently is built on top of a modified version of PostgreSQL and runs on about 1k machines. In his post, James speculates on the details of the internals. Very interesting.

A better question would be: who uses MySQL as anything but a glorified key-value store when reaching large sizes? I personally have been involved in trying to use MySQL as a "real database" in the 10-100TB range and let me tell you, it's not pretty. I'm not sure about the open source PostgreSQL, but I know Greenplum has petabyte level warehouses running on a distributed version of it.

A few observations:

1) MyISAM's performance is highly dependent on certain idiosyncrasies of a lot of applications. Using MyISAM in this day and age is a very bad idea. InnoDB at least gets closer to real database behavior.

2) The "query optimizer" is insulting at best and actively impeding getting things right if you use it for much more than simple queries. Something that's more along the lines of what really large databases (as opposed to KV stores) get used for can implode the server.

Personally, I think too many people try and stick things in relational databases that don't belong there simply because they've got the hammer in their hand and it's easier than pulling out a screwdriver.

who uses MySQL as anything but a glorified key-value store when reaching large sizes?

My assumption is both Quora and Facebook use MySQL this way. While you are right that this is not using it as a real database, I want to know if PostgreSQL is deployed in a similar setup at all. Most of the people (including you) don't take into account the fact that there are many cases where MySQL (used as a KV store) proved to work, while I have never heard of such huge PostgreSQL deployments. If this was a general discussion regarding MySQL and PostgreSQL I could understand that, however, I think the post is more about whether to choose MySQL or PostgreSQL if you are going to use it as a KV store.

The introduction of hstore actually lets you do this natively. I think it's partially that PostgreSQL people tend not to try and use the hammer as a screw driver, but maybe that's just me.

Or maybe sometimes MySQL is a screw driver being used as a hammer?

Some examples:

Skype uses PostgreSQL for their VOIP services.

Yahoo! has a customized PostgeSQL for their data warehousing storing a couple of petabytes of data.

I believe IMDB is another quite prominent user.

Tineye.com, for those who have heard of it, also uses PostgreSQL (I set it up). It is mostly a KV store with a metadata join, but it's +1.5 billion rows returning +10 random uncached rows in <100ms.

Originally we used MySQL, which revealed its true face at ~500 million rows. Queries that were <300ms suddenly turned into 2-3 minutes because the query planner decided it would be fun to do a full scan or somesuch. Migrating to PostgreSQL not only reduced the query time of the same queries by more than half but also allowed us to scale the number of rows pretty much linearly with a small constant.

Then I did some horizontal partitioning and things got really awesome.