Hacker News new | ask | show | jobs
by munk-a 2080 days ago
I'm a big fan of Postgres too for a number of reasons, but this issue is pretty clearly documented so I'd like to counter with an issue I hit in Postgres recently that is terribly documented. UNNEST works a bit funky, and in particular it works super funky if you have multiple calls in the same select statement (or any set expanded function calls it turns out). There's a bit of a dive into here[1] (though that is out of date - PG10 no longer follows the different array sized result, it uses null filling) which I managed to find after struggling with an issue where an experimental query was resulting in nulls in the output while unnesting arrays without nulls.

All DBs have their warts and while MySQL has an over abundance of warts they tend to be quite well documented. The warts that postgres has tend to be quite buried and their documentation is very good for syntax comprehension but rather light when it comes to deeper learning.

1. https://stackoverflow.com/questions/50364475/how-to-force-po...

3 comments

I am also a big fan of Postgres, and tend to have a bit of fun picking on MySQL having been scarred by it in a past life. But since we're picking at Postgres warts...

One which bit me recently, and is still utterly baffling to me, is that a column defined as an array type will accept values of that array's type in any number of dimensions greater than that specified for the column. In other words, `{{{{{{text}}}}}}` can be inserted into columns of the following types:

- `TEXT[]`

- `TEXT[][]`

- `TEXT[][][]`

- `TEXT[][][][]`

- `TEXT[][][][][]`

- `TEXT[][][][][][]`

The inverse is true as well! A column specified `TEXT[][]` (and so on) will accept `{text}`. Of course, none of this (as far as I've been able to find) is documented.

But wait, there's more!

`UNNEST` does not allow you to specify depth, it always unnests to the deepest dimension. This, too, is undocumented. In fact, it's anti-documented. The documents provide an example function to unnest a two-dimensional array that is wholly unnecessary (and likely performs worse than the built-in `UNNEST`, but I'm just guessing). Said documentation would seem to imply that the depth of `UNNEST` is 1, but of course that's not the case.

But wait, there's more still!

What if you want to get at a nested array? Idk, I'm sure it's possible, but if you thought `SELECT that_array[1]` is the way to do it, look under your seat because you're getting a `NULL`!

- - -

Postscript: I discovered the first part of this in a production system where a migration had incorrectly nested some data, and where that data was in turn causing certain requests to unexpectedly fail. Of course, given that this was in production, I didn't have a lot of time to research the issue. Found the problem, fixed it, moved on with my day. In the course of fixing it, I discovered the `UNNEST` issue, which... okay fun, fix it a slightly different way than I expected.

So in the course of verifying the particulars to write this comment, I played around with some things, and discovered the `NULL` issue.

At least when Postgres has wildly unexpected behavior, it's exceptionally unexpected behavior.

All of the array types are basically the same. The docs actually do mention this, but only in passing as a current limitation.

https://www.postgresql.org/docs/13/arrays.html

> The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.

Similar text extends at least all the way back to the documentation for 7.1.

The other key to understanding this is that multidimensional arrays in Postgres are not simply arrays of arrays like typical programming languages. That’s why you can’t do things like array_agg on an array column.
Thanks! I don't know why I wasn't able to find it. Still, I would never have expected any of the behavior I described.
I always internalized postgres's array behavior based on the examples, which include multidimensional arrays on text[] columns. Also, common functions like array_length require an index argument--"which array within the possibly multiple arrays do you want the length of" etc.
Also while we're picking on other DBs, another fun WTF I've encountered (this time in an external system): SQL Server stores timestamps to ~1/300th of a second resolution. This[1] StackOverflow question describes different behavior than I saw (it rounded differently), so apparently it's not even consistent. I'd assume across versions? IDK, never had time to look too deeply into this one either.

[1]: https://stackoverflow.com/questions/715432/why-is-sql-server...

Note that only happens using the DATETIME data type.

As long as I remember, the documentation for DATETIME [0] has had a disclaimer at the top to not use it, but use DATETIME2 instead.

Of course, that doesn't excuse older systems, but this is an issue that can be avoided for new work.

[0] https://docs.microsoft.com/en-us/sql/t-sql/data-types/dateti...

And datetime2 was introduced in mssql 2008… So 12 years ago.

Of course when maintaining the rust crate for mssql, implementing support for these old types was necessary and I had some wtf moments. Another interesting one is the smalldatetime, and the biggest annoyance the division to varchar and nvarchar -- the n-variants of cource using ucs2 to store the data.

This only changed in the 2019 version, where one can use varchar to store utf8 data with a special collation.

Still prefer mssql over mysql, any day.

Given the time of that question, my guess would be they are using floating point underneath, and have worked out the math so you get approximately that much precision. There's some weirdness in this between 32 bit and 64 bit, based on how the calculation is done. See [1].

I had a mysql instance once where I migrated from a 32 bit system to a 64 bit system, and one of the main tables' primary key was a float that was a high resolution timestamp. Unfortunately, on inserting the old data into the new 64 bit system with a MySQL compiled for 64 bit, the values started having lots of collisions on the primary key, where two previously distinct high-resolution float values now rounded to the same float in 64 bit. Counterintuitive, and a nightmare.

1: https://stackoverflow.com/questions/31415712/floating-point-...

I got bitten by this recently maintaining a legacy system. The same .NET DateTime object is stored in a DATETIME field and also embedded in an XML document in the same database record.

A comparison was done between the value stored in the XML and in the DATETIME field, but even using the same DateTime object there is sometimes a 1 to 3 millisecond difference between the two.

> I'm a big fan of Postgres too for a number of reasons, but this issue is pretty clearly documented

Of course it is, the documentation is where TFAA got the information in the 4th paragraph of the story, out of 15 or so.

The range itself is what nerd-sniped the author and led them to try and find out why mysql had such an odd yet specific range.

> I hit in Postgres recently that is terribly documented.

I'm going to have to disagree with you there. This issue is quite well documented in the "SQL Functions Returning Sets" section [0]. The relevant bit starts thusly:

> ...Set-returning functions can be nested in a select list, although that is not allowed in FROM-clause items. In such cases, each level of nesting is treated separately, as though it were a separate LATERAL ROWS FROM( ... ) item...

And there's even a note about the crazy behavior pre-PostgreSQL 10:

> Before PostgreSQL 10, putting more than one set-returning function in the same select list did not behave very sensibly unless they always produced equal numbers of rows. Otherwise, what you got was a number of output rows equal to the least common multiple of the numbers of rows produced by the set-returning functions. Also, nested set-returning functions did not work as described above; instead, a set-returning function could have at most one set-returning argument, and each nest of set-returning functions was run independently. Also, conditional execution (set-returning functions inside CASE etc) was previously allowed, complicating things even more. Use of the LATERAL syntax is recommended when writing queries that need to work in older PostgreSQL versions, because that will give consistent results across different versions.

I agree that allowing SRFs in the SELECT clause is a wart that should never have been permitted, but I think the PostgreSQL docs do a pretty great job describing both the old behavior and the new behavior that has to balance backwards compatibility with sensibility.

(And, indeed, the 9.6 docs have this to say on the behavior of SRFs in the SELECT list: "The key problem with using set-returning functions in the select list, rather than the FROM clause, is that putting more than one set-returning function in the same select list does not behave very sensibly.")

I do think one notable defect with the PostgreSQL docs is that they were designed in a time before modern search engines. They are better understood as a written manual in electronic form. Almost always the information you need is there, but possibly not in the chapter that Google will surface. But there are all sorts of tricks you can use if you update your mental model of how to read the PostgreSQL docs. For example, there's an old-style index! [1]

[0]: https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC... [1]: https://www.postgresql.org/docs/current/bookindex.html

> I do think one notable defect with the PostgreSQL docs is that they were designed in a time before modern search engines. They are better understood as a written manual in electronic form. Almost always the information you need is there, but possibly not in the chapter that Google will surface. But there are all sorts of tricks you can use if you update your mental model of how to read the PostgreSQL docs. For example, there's an old-style index!

I think of the Postgres docs as significantly better than most other documentation for this reason. The information is there and it's organized in a way that makes sense.

I hadn't seen the index - that's neat.

In terms of the documentation quality I do agree that it's documented but not obviously, you mentioned that it's in the "SQL Functions Returning Sets" section however this section isn't indicated to by the Array functions and operators page - additionally while, if you know that you're dealing with an issue related to SRFs it's pretty easy to find - attempting to get at that information via google isn't going to get you anything unless you specifically hone in on set returning functions. I ended up finding that doc page after finding a SO answer that mentioned SRFs while searching for "postgres unnest cartesian product".

The information is in the documentation, but the documentation isn't always super good at linking to other relevant portions of the documentation and, honestly, reading the documentation about the feature you need is reasonable, but I don't expect most people are reading the full postgres docs before starting to play around with it. So I don't disagree that the information is there, but I do think it is mostly inaccessible due to the structure of the documentation.