| "Arrays aren't really in the spirit of relational design." Why do you say that? To a certain extent I agree in broad terms; but I don't believe that it is somehow "unclean" or un-relational just because you are using an array (or any other complex data type). "This means that using them will change the ACID characteristics compared to a relational design." No, arrays are protected by ACID as is any other data in postgres. "it would seem like adding another item to a purchase would cause the entire purchase row to lock" It will still allow concurrent reads of the row in postgres (MVCC). In order for the write lock to be a practical problem, there would have to be a lot of concurrent updates to the very same purchase. I think we can all agree that the first example is "hacky" (which the author says in the article), so let's ignore that one. How about the second example? I think it's pretty reasonable to store tags that way, if for no other reason than it could save a lot of space. |
The interesting thing about arrays is depending on the work you're doing with them, they could either be a violation of 1NF or not. For example, if you're storing vectors in the physics sense, you may not be querying or updating their contents, in which case it's not a violation of 1NF because it's an atomic value. The danger of encouraging people to use Postgres's arrays is a fair number of the people reading are developers and will take this as carte blanch to store repeated values or sets as arrays, and miss out on a bunch of the power and elegance of relational databases (and probably cause themselves a lot of pain dealing with a type that has no direct and helpful JDBC mapping, for example).
> if for no other reason than it could save a lot of space
I discourage people from worrying too much up-front about the space utilization of their database design for several reasons. Intuitions about database storage and performance tend to be incorrect. Indexes usually matter a lot more in terms of disk use than the exact types of the columns. It's also a bad idea to spend a lot of energy optimizing the database before you have a significant amount of real-world data. Query performance alone varies dramatically between fake and real-world data sets, in part because the query planner will choose different strategies based on the sizes of tables and indexes and also because you usually don't know ahead of time what your access patterns will look like.
When it comes to databases, there really aren't very many hard and fast rules compared to rules of thumb. The rule of thumb on arrays is don't use them, and there are good reasons for that. It could be that you see massive performance and disk savings with tags and the loss of relational correctness doesn't bite you in a particular application, but you should investigate before choosing. All optimizations are trading flexibility for performance. You should make sure you don't need the flexibility and you do need the performance before making the trade.