Hacker News new | ask | show | jobs
by cotera-grant 840 days ago
> I selfishly hope this can smooth over all the missing SQL functionality in redshift

Hi! I'm Grant, I work at Cotera and wrote most of the warehouse compatibility stuff for NASTY

Redshift is the bane of my existence. It was definitely the hardest warehouse to write a NASTY compatible SQL gen for.

A couple of annoyances that immediately come to mind.

1. Redshift Query Planner does wild stuff

At Cotera we'll typically develop analytics libraries on one warehouse working closely with a customer and use the same library for other customers afterwards. A library will go to prod on one warehouse and then get start running on others as new customers with different warehouses want the functionality.

Moving a library between Snowflake, BigQuery an Postgres is almost never a problem performance wise. In, Redshift the semantics will be correct but performance can unexpectedly fall off a cliff for innocuous stuff. We typically write a bunch of unit tests so it's pretty easy to refactor, but I've been shocked at the things that Redshift can't optimize that everyone else had no problem with

2. Redshift does silly stuff with types of literals.

  with cte as (select *, 'foo' as "bar" from "cotera_data".foo) select coalesce("bar", 'baz') from cte;
Fails with the error `[XX000] ERROR: failed to find conversion function from "unknown" to text` Because 'foo' is passed as `any` type...

this fixes is but the error is bizarre and shows up way far away from the problem

  with cte as (select *, 'foo'::text as "bar" from "cotera_data".foo) select coalesce("bar", 'baz') from cte;
(NASTY now fixes this for you when it detects it will happen)

3. The `super` type breaks referential transparency

Here's just one head scratching example, but there are many super type limitations

  -- Allowed
  with bar as (select (json_parse('{"a": 1}')) as foo) select foo.a from bar;

  -- Not allowed
  select (json_parse('{"a": 1}')).a as foo

  -- [0A000] ERROR: applying array subscript on complex expression of SUPER type is currently not supported
4. Leader Only vs Compute Node Functions. Basic things like `generate_series` blow up in surprising ways

From the NASTY source code for Redshift

  // Valid redshift
  // ```
  // select generate_series(0, 10);
  // ```
  //
  // Not valid redshift
  // ```
  // -- Inserts run on compute nodes
  // insert into foo (a) (
  //     -- Leader only function
  //     select generate_series(0, 10) as a
  // )
  // ```
  //
  // This is because `generate_series` is a leader only function, so it can’t be run on worker nodes
  // https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_functions_leader_node_only.html
  // https://docs.aws.amazon.com/redshift/latest/dg/c_sql-functions-leader-node.html
  // https://stackoverflow.com/questions/62716606/redshift-loading-data-issue-specified-types-or-functions-one-per-info-message
  // https://stackoverflow.com/questions/17282276/using-sql-function-generate-series-in-redshift#comment96402527_22782384
  //
  // Recurive CTEs are NOT supported in subqueries
  // ```
  // -- Not valid
  // select \* from (
  //    with recursive t(n) as (
  //        select 1::integer union all select n + 1 from t where n < 100
  //    ) select n from t
  // );
  // ```
  // To get around this, we can use the approach outlined by how dbt does ansi sql generate_series

  // https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/generate_series.sql
  const numbers = (upperBound: number) => {
    if (upperBound > 2 ** 11) {
      throw new Error(
        `We only support generating series in Reshift where the upperBound is less than ${
          2 ** 11
        }`
      );
    }

    return `
  (
    with p as (
      select 0::integer as generated_number union all select 1::integer
    ),
      unioned as (
      select
        (   p0.generated_number * power(2, 0) 
  // ... Omitted for brevity
         +  p11.generated_number * power(2, 11) 
        ) as generated_number
      from
        p as p0
        cross join p as p1
  // ... Omitted for brevity

        cross join p as p11
    )
    select generated_number::integer from unioned where generated_number <= ${upperBound} order by generated_number
  )
  `;
  };