| "can readily turn operations" Which operations, specifically? We have the product tags example to work from, so it would help to be more concrete. "into O(M*N) rather than O(N)" I assume here that your target is "scalability to higher numbers of tags for a given product"? If so, please state that explicitly. Again, M and N aren't necessary, because we have an example to work from. Are the operations still susceptible to worse scalability on this axis if you employ a GIN index? "PgSQL documentation suggests cases that will hurt you." Please elaborate. "I would suggest that the relational model is maintained" I don't see how your solution is relational and the other one is not. Can you please explain? How is using an array different from using, say, a string, which is essentially an array of characters? How is it different from using a date, the components of which you can access (e.g. grouping by the month)? "the optimiser is likely to come up with a better solution." Please elaborate. |
Table is a list of rows. To access a row is O(N) assuming perfect B-Tree. To access an element in an array is O(M) therefore to access an element of an array in a row is O(N*M) as you have to materialize the entire page that the array is in from disk. That array could be larger than a page which leads to more problems.
GIN is a possibility but the array elements are ordered which is overhead (they are a list, not a relational set) therefore there is a condition there if you want to reference by index which is the primary point of arrays. If not, you should use a set if they aren't ordered which means you should probably use a table and join anyway.
Fundamentally membership is a set relation, not a list relation. The order of the tags is not important.
Pain = PgSQL documentation 8.14.3 which references indices of arrays directly in a where clause.
A string - I wouldn't do string manipulation anyway in a where clause.
A date - that's a number which you can apply relops to easily.
The optimiser will most likely be better at retrieving and caching data based on a join condition versus column contents.