Hacker News new | ask | show | jobs
by eyelidlessness 2081 days ago
I am also a big fan of Postgres, and tend to have a bit of fun picking on MySQL having been scarred by it in a past life. But since we're picking at Postgres warts...

One which bit me recently, and is still utterly baffling to me, is that a column defined as an array type will accept values of that array's type in any number of dimensions greater than that specified for the column. In other words, `{{{{{{text}}}}}}` can be inserted into columns of the following types:

- `TEXT[]`

- `TEXT[][]`

- `TEXT[][][]`

- `TEXT[][][][]`

- `TEXT[][][][][]`

- `TEXT[][][][][][]`

The inverse is true as well! A column specified `TEXT[][]` (and so on) will accept `{text}`. Of course, none of this (as far as I've been able to find) is documented.

But wait, there's more!

`UNNEST` does not allow you to specify depth, it always unnests to the deepest dimension. This, too, is undocumented. In fact, it's anti-documented. The documents provide an example function to unnest a two-dimensional array that is wholly unnecessary (and likely performs worse than the built-in `UNNEST`, but I'm just guessing). Said documentation would seem to imply that the depth of `UNNEST` is 1, but of course that's not the case.

But wait, there's more still!

What if you want to get at a nested array? Idk, I'm sure it's possible, but if you thought `SELECT that_array[1]` is the way to do it, look under your seat because you're getting a `NULL`!

- - -

Postscript: I discovered the first part of this in a production system where a migration had incorrectly nested some data, and where that data was in turn causing certain requests to unexpectedly fail. Of course, given that this was in production, I didn't have a lot of time to research the issue. Found the problem, fixed it, moved on with my day. In the course of fixing it, I discovered the `UNNEST` issue, which... okay fun, fix it a slightly different way than I expected.

So in the course of verifying the particulars to write this comment, I played around with some things, and discovered the `NULL` issue.

At least when Postgres has wildly unexpected behavior, it's exceptionally unexpected behavior.

2 comments

All of the array types are basically the same. The docs actually do mention this, but only in passing as a current limitation.

https://www.postgresql.org/docs/13/arrays.html

> The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.

Similar text extends at least all the way back to the documentation for 7.1.

The other key to understanding this is that multidimensional arrays in Postgres are not simply arrays of arrays like typical programming languages. That’s why you can’t do things like array_agg on an array column.
Thanks! I don't know why I wasn't able to find it. Still, I would never have expected any of the behavior I described.
I always internalized postgres's array behavior based on the examples, which include multidimensional arrays on text[] columns. Also, common functions like array_length require an index argument--"which array within the possibly multiple arrays do you want the length of" etc.
Also while we're picking on other DBs, another fun WTF I've encountered (this time in an external system): SQL Server stores timestamps to ~1/300th of a second resolution. This[1] StackOverflow question describes different behavior than I saw (it rounded differently), so apparently it's not even consistent. I'd assume across versions? IDK, never had time to look too deeply into this one either.

[1]: https://stackoverflow.com/questions/715432/why-is-sql-server...

Note that only happens using the DATETIME data type.

As long as I remember, the documentation for DATETIME [0] has had a disclaimer at the top to not use it, but use DATETIME2 instead.

Of course, that doesn't excuse older systems, but this is an issue that can be avoided for new work.

[0] https://docs.microsoft.com/en-us/sql/t-sql/data-types/dateti...

And datetime2 was introduced in mssql 2008… So 12 years ago.

Of course when maintaining the rust crate for mssql, implementing support for these old types was necessary and I had some wtf moments. Another interesting one is the smalldatetime, and the biggest annoyance the division to varchar and nvarchar -- the n-variants of cource using ucs2 to store the data.

This only changed in the 2019 version, where one can use varchar to store utf8 data with a special collation.

Still prefer mssql over mysql, any day.

Given the time of that question, my guess would be they are using floating point underneath, and have worked out the math so you get approximately that much precision. There's some weirdness in this between 32 bit and 64 bit, based on how the calculation is done. See [1].

I had a mysql instance once where I migrated from a 32 bit system to a 64 bit system, and one of the main tables' primary key was a float that was a high resolution timestamp. Unfortunately, on inserting the old data into the new 64 bit system with a MySQL compiled for 64 bit, the values started having lots of collisions on the primary key, where two previously distinct high-resolution float values now rounded to the same float in 64 bit. Counterintuitive, and a nightmare.

1: https://stackoverflow.com/questions/31415712/floating-point-...

I got bitten by this recently maintaining a legacy system. The same .NET DateTime object is stored in a DATETIME field and also embedded in an XML document in the same database record.

A comparison was done between the value stored in the XML and in the DATETIME field, but even using the same DateTime object there is sometimes a 1 to 3 millisecond difference between the two.