Hacker News new | ask | show | jobs
by pbz 1445 days ago
Since we have some PG devs here: Can we please have a way to reorder columns?

Coming from MySQL, this is one of the first missing features one hits. It's like moving to a new house where you're told you can never clean or move the furniture without moving to a new house.

It leaves the unfair impression that this is a "toy" db. "What other basic features is it missing if it doesn't have this?" Please don't think of it as trivial; first impression are very important.

7 comments

Better to channel your displeasure into advocating for a future SQL standard to include this feature. Other DBs that support this do so via proprietary syntax extensions.
Technically it doesn’t require syntactic extensions, postgres stores the column index as the `attnum` attribute column of the `pg_catalog.pg_attributes` system table.

So this could be made to work by hooking into the storage system and rewriting the table and all pointers to the table when `attnum` is updated.

Without this rewriting this only works if the table was just created and has no data, and no external metadata referencing the columns themselves e.g. views, fks, indexes, defaults, rules, …

An alternative would be to add automatic packing (à la rustc) to postgres, decorrelating the “table position” and the “physical position” of the rows, this would also allow free “table position” reordering.

And while it’s by far the most complex option, one of the nice bits with it would be that the system columns could be packed as well. Currently there’s quite a bit of waste because there are 6 system columns (by default), the first 5 are 4 bytes, but the 6th (ctid) is 6 bytes, meaning 2 bytes of padding if your first column is a SERIAL, and 6 if the first column is a BIGSERIAL (or an other double-aligned column).

> And while it’s by far the most complex option, one of the nice bits with it would be that the system columns could be packed as well. Currently there’s quite a bit of waste because there are 6 system columns (by default), the first 5 are 4 bytes, but the 6th (ctid) is 6 bytes, meaning 2 bytes of padding if your first column is a SERIAL, and 6 if the first column is a BIGSERIAL (or an other double-aligned column).

FWIW, system columns aren't stored as normal columns in tuples. Some of them are implied (e.g. tableoid doesn't need to be stored in each tuple, ctid is inferred from position), others are not stored in the way normal columns are stored (e.g. xmin, xmax).

Couldn't an ordering column be added to the pg_attributes that determines in which way the columns are sorted when they are displayed? Standard SQL code can then be used to manipulate the display order of the columns.
There's like 5 different features wants I could add like this. A columnar database engine comes to mind (slightly more work involved) and poaching a few Sqllite QOL features.

Ultimately Postgres is a community driven product, and people want to work on what they want the work on.

I don't get angry that some kind soul hasn't volunteered their Saturday for free for me.

That's not exactly true. You can write your own patch for a feature and there's a good chance they'll decide they don't want it. It's their prerogative, but also slightly upsetting when you know they're blocking features you want. Just observe the bike shedding over a trailing commas patch [1].

1. https://postgrespro.com/list/thread-id/1853280

There are lots of good reasons in that thread why the patch was declined.

The patch made things more lenient which makes things less compatible (this whole thread is about ANSI SQL standards and they do matter) and changes what was an error behavior to now silently succeed.

+1 to postgres devs for curating patches.

I read the thread and have to disagree with "lots of good reasons". I'm pretty sure there are plenty of things in the postgres dialect that are not cross compatible with any other dialect.

I agree with curating patches in general, but it's a double edged sword and inaccurate to say you can just spend your weekend to get something you care about.

Oh, absolutely; I'm just trying to raise the awareness/importance of this missing feature. Most devs that I know IRL that looked at PG had a very similar reaction to mine.
I'm receptive to the feature. But wouldn't it require locking the full table as it works? Seems like it would hurt availability of the biggest databases that would benefit most. Perhaps this is why it is thought somewhat impractical and therefore low-priority.
Depends on how the PG team decides to implement this feature. If they're OK with a virtual order (for "UI" purposes) while keeping the true order hidden then this would not require table locking (or it would be a very quick lock). If the order reflects the real / physical positioning then yes, it would probably require locking for a full table rebuild.
What do you gain from that other than soothing your OCD when executing \d t (?)
I like to have my `*_id` columns at the front, and `created_at` etc. at the back. And in the middle all the fields by descending importance. Just a personal habit so I can quickly lookup data in a table.
I'm totally with you on this, and would love to be able to directly reorder the columns in the tables.

A possible workaround is to have a view for every table (which I often read is a good practice anyway) and order the columns as you want in the views.

Exactly... You gain so much speed by having everything in the right order.
This seems pretty voodoo to me. Do you have any examples that show it?
I'm not sure how I could show if something is easier/harder to work with. Speed in this context refers to human processing speed, not database speed.
There is some small benefit to playing column tetris if you have columns with different sizes that waste space due to padding. I'm not convinced this would be worth the complexity of this feature, but in some cases reordering columns might have measurable benefits in reducing the size of the data.
I'm not talking about the physical layout of the data, just a thin UI layer that the DB tool could use. Maybe we could have two modes: physical vs UI ordering?
As mentioned by another comment: you can use views for that.
Views are not a good solution for this. The point is to be able see a specific order in 3rd party apps and when writing quick add-hoc queries (select *). Writing views for every table would just pollute the db.
Based on my understanding of DB storage (which is decades old and as I write the comment my explanation seems stupid but it’s what I was taught. Please someone correct me).

Image a table with the following:

id (int), title (varchar), created_date (date)

That data is stored in a similar order on disk (is this still true?).

So, if the initial insert has a title of 10 characters, then the date will get placed after those 10 characters on disk:

Int, 10 characters, date

Later, the row is updated with a longer title of 500 characters.

Because it’s longer, you now have to get more space for the title on another part of the disk.

Since there are columns after the title, it’ll either leave dead space where the old 10 character title was or it’ll need to move those other columns too.

I can’t remember which I was taught. I just remember that columns after variable columns get impacted when the data in a preceding variable column changes.

If the title were at the end of the row, the db could expand the data without needing to move other columns over (if there happened to be available space right there).

If you were updating with a shorter value, it could shorten the row by moving the terminating character to the new shorter location, freeing up the space the longer title was using.

Bottom line is you want to keep variable columns which get updated most frequently towards the end of your table structure.

Knowing which column that would be isn’t always clear when creating the table.

Again, this understanding was taught to me over 20 years ago. I’m probably remembering parts of it wrong and DB storage has likely (hopefully) advanced since then.

> That data is stored in a similar order on disk (is this still true?).

There should be no requirement for this. Columns in relations are not conceptually ordered, so it shouldn't matter for the things you're doing with the data anyway, and the database should be able to reorder the data in whatever way it likes, since desire to isolate the user from physical data structures was one of the main reasons for the rise of RDBMS.

1. That’s not exactly true, AFAIK all RDBMS return columns in table order when order is unspecified (`*`), and while they could reorder on retrieval

2. postgres definitely does not, and column tetris is absolutely a thing in the same way struct packing is (with the additional complexity of variable-size columns)

But the order of attributes in the presentation of the result relation has nothing to do with physical layout of base relvars -- primarily because any relationship between the two is purely coincidental. The vast majority of useful queries will not reuse the order of attributes in base relvars, so optimizing for the unusual trivial case by prohibiting better rearrangements that could be useful for a much larger number of use cases seems rather pointless.

And what PostgreSQL does is of course an implementation detail of PostgreSQL.

> the database should be able to reorder the data in whatever way it likes

My point was that you’re trying to prevent the DB from reordering the data because there’s a performance cost when that happens.

Why would you be trying to prevent the DB from reordering the data? You're not supposed to have better knowledge of what's good for your use case than an RDBMS that can collect usage statistics on queries and such. Ditto for compilers rearranging structures and such. When you start having hundreds of tables and thousands of queries, I don't see how you can do a better job than an automated system at that point.
> Why would you be trying to prevent the DB from reordering the data?

Sure. “Reduce the need” would be a better word than “prevent”.

If I can do a good job organizing the table columns (as described above) it’ll lower the need for the DB to reorder data.

Reduced need to reorder, improves performance.

DB storage is a lot more sophisticated than what you were taught.

Most databases use Slotted Pages to organize storage. Pages are fixed size and numbered by their offset within the database file. The page header contains the number of rows, followed by an array of offsets for individual rows within the page. Rows themselves generally are stored at the end of the page filling downwards. The storage engine can move around rows in arbitrary ways to consolidate free space.

Fundamentally there's no connection between SQL schema order and how table storage is organized on disk. For example in a column store there's often no contiguous row stored anywhere, instead there's just separate indexes per column.

That would seem like an argument in favor of allowing column reordering.
Soothing my OCD is important, but it also helps when working in a team and having a shared ordering style. For example, we have certain types of columns that are always at the bottom. One would look there first. With PG you're forced to scan all the columns every time.
This is why I order anything alphabetically, always. Properties, methods, columns… anything. Saves you the trouble of custom conventions.
The trouble is when you need to add a new column you can only add it at the end, so you'd lose the alphabetical ordering.
Ideally, postgres would play column Tetris behind the scenes and store the columns on disk in the most appropriate way, while allowing the representation to be changed at will.
Yeah with maybe an option to manually optimize (that would rebuild the table if needed).
> It leaves the unfair impression that this is a "toy" db.

So you consider Oracle, SQL Server and DB2 also to be "toy" databases?

With SQL Server the management tool does give you a way to do this. Yes, it does a table rebuild behind the scenes. The point is that it's easy. Don't have experience with the other two, but MySQL is the most popular so it kinda sets the tone whether we like it or not.
Well, writing a procedure that rebuilds the complete table in Postgres or Oracle is easy as well. I never needed this, but I am sure, there are some sample implementations out there.

Rebuilding the entire table doesn't seem feasible for large tables to begin with. Especially with a lot of incoming and outgoing foreign keys.

I disagree that MySQL is the most "popular".

It might be the "most used" one because of so many web hosting services included it for ages by default.

Tangential, but there are almost certainly more SQLite databases in existence than every other RDBMS put together, probably by 3 or 4 orders of magnitude.

It doesn’t support column reordering either.

> Please don't think of it as trivial

I've worked with relational databases for 20+ years. This is the very first time I heard of this.

I've worked with DBs for 20+ years as well. This is a quality of life type of improvement. If you've worked mainly with DBs that don't make this easy it's hard to know what you're missing. Do a search for column reordering for PG and you'll get a ton of hits.
The problem is that we hear a lot of different features touted as "the crucial missing one"...

Anyway, there's been work on this in the past, which recently has been picked up again. It's not all that trivial to do well.

The problem is that we hear a lot of different features touted as "the crucial missing one" -- I would definitely put this into the polish category, but items in that category are also important; especially for those with a MySQL background.

which recently has been picked up again -- that's awesome to hear