Hacker News new | ask | show | jobs
by arunitc 1859 days ago
Delete From "APCRoleTableColumn" Where "ColumnName" Not In (Select SC.column_name From (SELECT SC.column_name, SC.table_name FROM information_schema.columns SC where SC.table_schema = 'public') SC, "APCRoleTable" RT Where SC.table_name = RT."TableName" and RT."TableName" = "APCRoleTableColumn"."TableName");

I know this is not an optimized SQL. But this takes about 5 seconds in Postgre while the same command runs in milliseconds in MSSQL Server. The APCRoleTableColumn has only about 5000 records. The above query is to delete all columns not present in the schema from the APCRoleTableColumn table

I used to be a heavy MSSQL user. I do love Postgre and have switched over to using it in all my projects and am not looking back. I wish it was as performant as MSSQL. This is just one example. I can list a number of others too.

4 comments

If I remember correctly, SQL Server will convert NOT IN to anti-join. PostgreSQL currently does not do that due to NOT IN being incompatible with anti-joins in regards to NULL values. There's room for improvement there by detecting if NULLs can exist or not, and converting if they can't.

If you don't need the NOT IN weirdness around NULL values then I'd suggest you just use a NOT EXISTS. That'll allow something more efficient like a Hash Anti Join to be used during the DELETE. Something like:

Delete From "APCRoleTableColumn" Where Not EXISTS (Select 1 From information_schema.columns SC INNER JOIN "APCRoleTable" RT ON SC.table_name = RT."TableName" Where RT."TableName" = "APCRoleTableColumn"."TableName" AND SC.column_name = "APCRoleTableColumn"."ColumnName" AND SC.table_schema = 'public');

Is that faster now?

It’s a little hard to parse that on mobile but it looks like you’re doing correlated subqueries against the dB schema for each row in the table you’re deleting from.

As others have said, explain analyze will show you what’s going on. I’m fairly sure this query would be fixed by flipping and / or adding an index. 5k records is nothing to pg.

Can you share the explain analyze output of the query?
Have you checked performance using different algorithms like hash-join, merge-join and nested-loop ?