Hacker News new | ask | show | jobs
by mmaunder 3823 days ago
Unless I'm mistaken MySQL has had this for almost a decade with "ON DUPLICATE KEY UPDATE". I'm seeing a lot more about PSQL here and in the news. I've always found it to be unfriendly and slow. Why the new attention? Is there really something about PSQL that makes it better than MySQL these days? It used to be transactions, but InnoDB made that moot years ago.

We do over 20,000 queries per second on one of our production mysql DB's and I'm not sure I'd trust anything else with that: http://i.imgur.com/sLZzXhS.png

Just curious if I'm missing out on some new awesomeness that PostgreSQL has or if it's just marketing.

7 comments

> Is there really something about PSQL that makes it better than MySQL these days?

In a word: correctness.

Yes, MySQL has an UPSERT implementation. Like so many things MySQL rushed out the door, it's also buggy and unpredictable. Did you know UPSERTing into a MySQL table with multiple unique indexes can result in duplicate records? Did you know MySQL's ON CONFLICT IGNORE will insert records that violate other not-NULL constraints? [1]

I've used both MySQL and PostgreSQL for over a decade, and working around the many MANY misbehaviors and surprises in MySQL requires continuous dev effort. PostgreSQL on the other hand is correct, unsurprising, and just as performant these days.

MySQL is what happens when you build a database out of pure WAT [2].

[1] https://wiki.postgresql.org/wiki/UPSERT#MySQL.27s_INSERT_......

[2] https://www.destroyallsoftware.com/talks/wat

Nah. MySQL rocks. I've been using it since 1998 at Credit Suisse, 2000 at eToys.com where we used it to run the entire company from warehouse to web. I used it at the BBC in 2003 for a high traffic Radio 1 application and I've used it since then on my own companies with serious volume including a job search engine featured in NYTimes and Time Mag in 2005 and feedjit.com doing real-time traffic on over 700,000 sites. We use it for Wordfence now which is where the image link comes from I posted earlier with 20K TPS. All very high traffic with consequences if it screws up. I've never run into any of the issues you mention.

You say "upserting into a mysql table". Which storage engine? MyISAM? InnoDB? I find MySQL to be both reliable and incredible durable i.e. it handles yanking the power cord quite well. The performance also scales up linearly for InnoDB even for very high traffic and concurrency applications.

We use redis, memcached and other storage engines - by no means are we tied to mysql. But for what it does, it does it incredibly well.

I'm also completely open to using PostgreSQL and I was hoping someone could give me a compelling reason to switch to it or to use it.

I've worked at a company that was doing > 20,000 TPS (transactions/sec) on a single PG instance with no problem. That is baby-tier usage for a DB.

As far as why you should give up MySQL like I did four years ago: http://grimoire.ca/mysql/choose-something-else

> I find MySQL to be both reliable and incredible durable i.e. it handles yanking the power cord quite well.

Mmm. Assuming that your underlying disks don't lie, losing only the data that was in-flight to the WAL (or whatever is the equivalent in your DB of choice) is the absolute worst data loss you should see from a real SQL database in that situation. [0]

Think about the case where an error causes the DB software to crash... You really can't make good data robustness guarantees if an unexpected crash endangers more than just the data that's in-flight to the WAL.

I personally have had the disks that back a rather large (but relatively low-update-volume) Postgres DB drop out on multiple occasions and never lost any data at all. This shouldn't be unexpected behavior. :)

> You say "upserting into a mysql table". Which storage engine? MyISAM? InnoDB?

"In addition, an INSERT ... ON DUPLICATE KEY UPDATE statement against a table having more than one unique or primary key is also marked as unsafe. (Bug #11765650, Bug #58637)" [1]

Given that this is an unqualified assertion, straight from the MySQL docs, it seems safe to say that this is a weakness in MySQL's UPSERT-alike, rather than any limitation of a particular underlying storage engine. (Here are some bug reports that also make unqualified statements about the behavior at issue: [2][3])

> ...I was hoping someone could give me a compelling reason to switch to it or to use it.

shrug Both MySQL and Postgres are capable databases. Postgresql has better documentation, appears to have substantially better internal architecture, and -from what I remember of my early days with MySQL- has far fewer hidden sharp edges than MySQL.

I'm not here to tell you to change what DB you're currently using for your production stuff... I don't think anyone is. For my projects, I have had substantially better experiences with Postgres than MySQL.

EDIT: It occurred to me in the shower that I didn't make "loss of in-flight WAL data" sufficiently clear. Unless you explicitly ask for a mode where writes are acknowledged before the DB believes that they're safe on disk, [4] then the only "data loss" in the scenarios I described in the comment would be data that had been transmitted to the server, but not acknowledged as committed to the DB. So, correctly-written client programs would experience this "data loss" only as a transaction that failed to commit, maybe followed by DB unavailability. Sorry for the ambiguity. :(

[0] http://www.postgresql.org/docs/9.4/static/wal-reliability.ht...

[1] http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.h...

[2] http://bugs.mysql.com/bug.php?id=58637

[3] http://bugs.mysql.com/bug.php?id=72921

[4] I say "believes they're safe" because underlying storage can always lie, and there's not a goddamn thing you can reliably do about it if it does.

If all you care about is QPS, by all means, stick with MySQL.

People like myself use Postgres because it has a much richer feature set. See http://stackoverflow.com/a/5023936/270610 for some examples. Personally I find MySQL beyond frustrating due to its lack of… well almost all of those. Recursive CTEs in particular, but arrays and rich indexing are pretty core too.

Postgres's query optimizer is far more advanced too. MySQL doesn't even optimize across views, which discourages good coding practices.

The documentation is fantastic. Complete and well-written, covers the nuances of every command, expression, and type. MySQL's doesn't hold a candle to it.

Don't know what you mean about "unfriendly". Help is built into the command-line tool, and like I said, the documentation is fantastic. Maybe MySQL is a little more "hand-holdy", but I don't care for such things so I wouldn't know.

It's a long time since MySQL was faster than Postgres.

Back in the early 2000s, LAMP people on Slashdot were benchmarking MyISAM tables to Postgres' 6.5/7.x's fully transactional engine. Unfortunately, the reputation as being slow stuck among developers.

Postgres particularly shines on multicore systems, thanks to some clever internal design choices. Having a sophisticated cost-based query planner also helps.

As for unfriendly: Care to amplify? In my work, I've found the opposite to be true.

For example, the very first thing you tend to encounter as a new developer is "how to create a user". For MySQL, it turns out that using GRANT to grant a permission creates a user, which is counterintuitive; GRANT also sets the password, and promotes the use of cleartext passwords. By comparison, Postgres has "createuser", as well as a full-featured set of ALTER USER commands. The difference between "mysql" and "mysqladmin" is also completely unclear.

The almost complete lack of warts and legacy cruft in Postgres significantly removes the possibility of confusion, uncertainty and information overload. MySQL's manuals are littered with "if X is enabled then this behavior is different, and in versions since 5.7.3.5 this behavior has been changed slightly, and 5.7.3.6 has a bug that silently swallows errors", etc. MySQL's historical date and NULL handling alone is worth a chapter of any book.

Postgres also has a level of strictness above MySQL, which is in itself instructive. You know when you're doing something wrong. Postgres never accepts bad input. It always requires a strictly correct configuration setup.

Plus: Just type \h in psql. It has a complete reference of the entire SQL syntax.

> LAMP people on Slashdot were benchmarking MyISAM tables to Postgres' 6.5/7.x's fully transactional engine.

It wasn't just the /. crowd. Back in the 3.5 days the MySQL devs were doing that too, and writing long discourses on why transaction safety was a bunch of crap and a crutch for bad application developers.

Off the top of my head:

  - CTE's
  - Arrays/JSON type
  - partial indexes
  - transactional DDL
  - NOTIFY
  - Materialized views
  - Schemas
  - PostGIS
  - Row level security (which is new in PG)
I'm no expert but in my (limited) experience, there are some super handy datatypes and features that PostgreSQL supports that MySQL doesn't:

- Arrays, particularly with GIN indexes. This makes things like tagging fantastic in Postgres. Instead of putting your tags in another table, you throw them in an array and you can do all kinds of things like set intersection-like queries.

- JSON. Postgres can store data as JSON and index and query the JSON. This essentially gives you MongoDB type queries.

I'm sure there's more but these are my favourite Postgres features.

Mysql always seemed to be fast like a bike going downhill with no brakes.

Postgres has always taken a more 'solid' approach. One instance that made my jaw drop when I realized it: in the past (has this been fixed?), DDL (alter table, create table, etc...) were not transactional in Mysql. You could get 50% through a series of them, and find your database 100% fucked up.

That said, over the years Mysql has been improving too, for sure.

MySQL still does not have transactional DDL.
Neither does Oracle!
Some of it is historic - Mysql has gotten much better in recent years at supporting the parts of being an RDBMS that matters the most when money is riding on it.

So honestly, at this point I do think some of it is impressions from the past which are no longer valid. But still, Mysqlhas done/does all sorts of things that defy the spec, convention or just common sense (I don't know if this has been fixed, but at least for many years, April 30th was treated as a valid date, and there was some profound weirdness of which I can't quite recall the details involving locale stuff).

Postgres generally takes the position that data should always be safe first and speedy sometime later. It also assumes the operator understands their tools. That second one means in comparison with Mysql, people think it is unfriendly. It isn't (if you want to see unfriendly, go work with Oracle), it just expects that its friends learn about it. Which is of course good advice when you're dealing with complicated software on which a lot of money tends to ride.

And as the PG devs have said for years, they don't compete with Mysql. They compete with Oracle. There's no reason to switch if you're happy with Mysql.

What's wrong with April 30th? And operator understanding their tools applies to MySQL as well. You can set it be more strict, but default is lax. Not sure if reverse is possible with PG.
PP probably intends to refer to February 30th (or 31st or 32nd), a popular handle to MySQL's (ahem) surprising date handling.