| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jeffdavis 4888 days ago

I disagree that MySQL is a better choice when you have "faulty" data:

* In postgres, you have PL/python (or PL/V8JS, or perl, or many other languages) to help you through the mess. For instance, you could write a canonicalization function to help you put the data into a more-queryable form. You can then even index on that function.

* Powerful triggers might help with post-processing, or putting data into some queue of "bad data" that needs to be cleaned up later. Maybe by doing so, you realize that the data isn't "bad", your schema just needs to be updated to reflect new interesting cases.

* You can pull data in from remote sources with foreign data wrappers, which might be necessary to clean the data up properly (e.g. one extra join against the company LDAP directory using the email might be able to canonicalize those employee names).

* You can catch errors using subtransactions and have a different processing path for data that doesn't fit in the schema.

Maybe some of these features exist in MySQL (I haven't been a real user since around 2003, aside from a bit of administration). But in postgres, these features all work together seamlessly along with all of the other features in postgres to make it all work nicely and without a pile of caveats. And that matters a lot when trying to wrangle strange data.

1 comments

sadmysqluser 4888 days ago

If you read pippy's statement carefully

  Also I have no idea what my users will do,
  and I'd rather have faulty data inserted than none at all.

you'll see we're in agreement - PostgreSQL's design and feature set makes it clearly superior to MySQL in preventing the spread of faulty data. But pippy would rather a garbage database rather than be slowed down by preventive measures. It's costing him development time after all...

By the way - thanks for all the great work on PosgreSQL range types.

link