Hacker News new | ask | show | jobs
by pests 953 days ago
Are there any pros/cons to the raw query, CTE, and view approches?
2 comments

Yes.

Using a view in this example, you can't dynamically change which files are being selected (not even with joins or where clauses). What if new files are generated and suddenly there are more or less files? Therefore you probably wouldn't want to encapsulate your SQL into a view. Most of the time you would probably bind the list of files in your SQL as needed:

   SELECT SUM(size) AS size FROM read_parquet(:files);
But in this case, a table macro/function might also be an option:

   CREATE MACRO GET_TOTAL_SIZE(num_of_files) AS TABLE (
        SELECT
            SUM(size) AS size
        FROM read_parquet(
            list_transform(generate_series(0, num_of_files), n -> 'https://huggingface.co/datasets/vivym/midjourney-messages/resolve/main/data/' || format('{:06d}', n) || '.parquet'
            )
        )
    );

    SELECT * FROM GET_MJ_TOTAL_SIZE(55);
Not necessarily related to this article, but CTEs are useful for breaking down a complex query into more understandable chunks. Moreover, you can do interesting things within the CTE's temp-tables like recursion or freezing/materializing a temp-table's results so that it only gets evaluated one time, instead of every time it gets referenced.

http://duckdb.org/docs/sql/query_syntax/with#recursive-ctes http://duckdb.org/docs/sql/query_syntax/with#materialized-ct...

From my experience with other databases my assumption for DuckDB is:

- Using a raw query, a CTE or a view will have no impact at all on query performance - they'll be optimized down to exactly the same operations (or to a query plan that's similar enough that the differences are negligible)

- CTEs are mainly useful for breaking down more complicated queries - so not great for this example, but really useful the moment you start doing anything more complicated.

- Views are effectively persistent CTEs - they're great if you want to permanently "bookmark" more complex pieces of queries to use later.

I wrote a bit more about CTEs here: https://datasette.io/tutorials/data-analysis#ctes