Hacker News new | ask | show | jobs
by b9a2cab5 1665 days ago
Being able to see exactly which columns are partitioned on so I know whether my query is going to be efficient or require full scan+materialization. I understand from your link Snowflake has clustering keys but it seems like this only "suggests" that Snowflake should partition on that key. Also with Spark I can look at the query plan while it's executing and figure out what part of my query is bottlenecking. Seems like Snowflake requires a query to complete for me to do that.
1 comments

Clustering information is available in the object browser in the UI, as well as the results of a show table.

https://docs.snowflake.com/en/user-guide/tables-clustering-k...

Snowflake does not automatically cluster tables, because it is something users need to decide if they want to or not, as well as the best clustering strategy. In some instances it is better to cluster on one column to increase performance of loads, but a different key would benefit users querying the table. Which is the right choice? It's up to the user to decide. What if you want both? Create the table with the clustering key to optimize load, then create a materialized view that clusters by the keys best for the queries and Snowflake will automatically rewrite queries to use the MV if it makes more sense. Problem(s) solved.

You can access the query profile while it is running to understand which step is currently running and where your bottlenecks are.

https://docs.snowflake.com/en/user-guide/ui-query-profile.ht...

I use this often when troubleshooting long running queries, most often users forget a join condition and it results in a semi-product join. This is easily caught with high row counts coming out of two nodes with low row counts.