|
|
|
|
|
by fusiongyro
5043 days ago
|
|
I assume by complex data type you mean something like (say) complex numbers, IP addresses, etc. The key difference is that the components of a complex data type don't stand on their own, or it's a violation of 1NF. I wouldn't mind storing a complex number in one column instead of two largely because I'd rather math with complex numbers "just work" and I'm unlikely to go updating the database for just the imaginary part. Likewise with IP addresses, it is useful to query them with netmasks and the like, but you're unlikely to query or update some middle component of them. The interesting thing about arrays is depending on the work you're doing with them, they could either be a violation of 1NF or not. For example, if you're storing vectors in the physics sense, you may not be querying or updating their contents, in which case it's not a violation of 1NF because it's an atomic value. The danger of encouraging people to use Postgres's arrays is a fair number of the people reading are developers and will take this as carte blanch to store repeated values or sets as arrays, and miss out on a bunch of the power and elegance of relational databases (and probably cause themselves a lot of pain dealing with a type that has no direct and helpful JDBC mapping, for example). > if for no other reason than it could save a lot of space I discourage people from worrying too much up-front about the space utilization of their database design for several reasons. Intuitions about database storage and performance tend to be incorrect. Indexes usually matter a lot more in terms of disk use than the exact types of the columns. It's also a bad idea to spend a lot of energy optimizing the database before you have a significant amount of real-world data. Query performance alone varies dramatically between fake and real-world data sets, in part because the query planner will choose different strategies based on the sizes of tables and indexes and also because you usually don't know ahead of time what your access patterns will look like. When it comes to databases, there really aren't very many hard and fast rules compared to rules of thumb. The rule of thumb on arrays is don't use them, and there are good reasons for that. It could be that you see massive performance and disk savings with tags and the loss of relational correctness doesn't bite you in a particular application, but you should investigate before choosing. All optimizations are trading flexibility for performance. You should make sure you don't need the flexibility and you do need the performance before making the trade. |
|
The characters or substrings that make up a string stand on their own. In fact, it's pretty hard to differentiate a string from an array of characters.
Similarly, the date component of a timestamp stands on its own, and timestamps are routinely broken up in that fashion.
"The interesting thing about arrays is depending on the work you're doing with them, they could either be a violation of 1NF or not."
In other words, you're saying that queries are a violation of 1NF, rather than the schema?
What about integers and modulo division? Is modulo division considered to be accessing a component of an integer?
"people reading are developers and will take this as carte blanch to store repeated values or sets as arrays"
There's no substitute for careful thought when designing a schema.
"Intuitions about database storage and performance tend to be incorrect."
Good point.
"loss of relational correctness"
I still don't think you've justified this. Remember that the normal forms are formally defined, so you should be able to show a clear distinction, which I've not seen yet.
"All optimizations are trading flexibility for performance."
What particular kind of flexibility is lost in this example (product tags)?
"The rule of thumb on arrays is don't use them, and there are good reasons for that."
This really sums up the situation. It's easy to use arrays for the wrong things, so the rule of thumb is to never use them. But they are still useful often enough for developers to get frustrated, and the rule of thumb may cause them to invent a worse solution elsewhere.
We should make the good reasons known, rather than just the rule of thumb.