|
|
|
|
|
by b9a2cab5
1665 days ago
|
|
Being able to see exactly which columns are partitioned on so I know whether my query is going to be efficient or require full scan+materialization. I understand from your link Snowflake has clustering keys but it seems like this only "suggests" that Snowflake should partition on that key. Also with Spark I can look at the query plan while it's executing and figure out what part of my query is bottlenecking. Seems like Snowflake requires a query to complete for me to do that. |
|
https://docs.snowflake.com/en/user-guide/tables-clustering-k...
Snowflake does not automatically cluster tables, because it is something users need to decide if they want to or not, as well as the best clustering strategy. In some instances it is better to cluster on one column to increase performance of loads, but a different key would benefit users querying the table. Which is the right choice? It's up to the user to decide. What if you want both? Create the table with the clustering key to optimize load, then create a materialized view that clusters by the keys best for the queries and Snowflake will automatically rewrite queries to use the MV if it makes more sense. Problem(s) solved.
You can access the query profile while it is running to understand which step is currently running and where your bottlenecks are.
https://docs.snowflake.com/en/user-guide/ui-query-profile.ht...
I use this often when troubleshooting long running queries, most often users forget a join condition and it results in a semi-product join. This is easily caught with high row counts coming out of two nodes with low row counts.