| I disagree that MySQL is a better choice when you have "faulty" data: * In postgres, you have PL/python (or PL/V8JS, or perl, or many other languages) to help you through the mess. For instance, you could write a canonicalization function to help you put the data into a more-queryable form. You can then even index on that function. * Powerful triggers might help with post-processing, or putting data into some queue of "bad data" that needs to be cleaned up later. Maybe by doing so, you realize that the data isn't "bad", your schema just needs to be updated to reflect new interesting cases. * You can pull data in from remote sources with foreign data wrappers, which might be necessary to clean the data up properly (e.g. one extra join against the company LDAP directory using the email might be able to canonicalize those employee names). * You can catch errors using subtransactions and have a different processing path for data that doesn't fit in the schema. Maybe some of these features exist in MySQL (I haven't been a real user since around 2003, aside from a bit of administration). But in postgres, these features all work together seamlessly along with all of the other features in postgres to make it all work nicely and without a pile of caveats. And that matters a lot when trying to wrangle strange data. |
By the way - thanks for all the great work on PosgreSQL range types.