Hacker News new | ask | show | jobs
by code_runner 491 days ago
I appreciate that somebody somewhere may appreciate and enjoy this, but I am not that person. I love SQL. I have always loved SQL. I know why others don't, but I do and its beautiful.
2 comments

While SQL is great, complex analytical queries are much easier to write, read and modify when they are written in pipe-based syntax because of the reasons outlined in the announcement:

- Pipe-based query is executed in clear sequential stages according to the pipes written in the query. SQL execution stages aren't clear enough.

- SQL requires mentioning GROUP BY fields twice - at SELECT and GROUP BY statements. This complicates modifying the query, since you need to modify the set of GROUP BY fields at SELECT too.

- There are no limits on the combination and the number of pipes, which can be used in a single query. There are various limits in SQL. For example, you cannot use WHERE after GROUP BY (there is HAVING workaround, but it looks unnatural). Another example - you cannot easily use GROUP BY multiple times with arbitrary additional calculations and transformations in the middle (of course, you can use subqueries as a workaround for these limitations, but this will complicate the resulting query too much).

I recommend reading the following short migration guide from SQL to pipe-based LogsQL (query language optimized for typical queries over logs), in order to feel the difference - https://docs.victoriametrics.com/victorialogs/sql-to-logsql/

I universally use column positions in the GROUP BY.

  SELECT name, count(*)
  FROM account
  GROUP BY 1
That is still annoying, but requires less typing+modification.
i go back and forth on this a lot, but with fewer and fewer "in production" SQL queries in my life, number based group by is definitely my default
It's insufficient to say that something is objectively better. Often times user preferences are multi-modal, and people cluster around groups. Pipe-based database query syntax may be objectively "easier" by some mathematical deduction, but by the laws of statistics > 50% of the userbase may prefer a different syntax.
This isn't an alternative to it.

You can still write

``` SELECT country, COUNT() AS num_records FROM t GROUP BY country ```

And then append something like

``` |> EXTEND FLOOR(num_records / 10) 10 AS num_records_bin |> AGGREGATE COUNT(*) AS num_countries GROUP BY num_records_bin ```

to get a histogram without having to nest stuff

this is SUPER interesting actually. I definitely didn't realize this. I'm still probably going to use a small army of CTEs for this sort of stuff, but I'm very interested to give this a shot the next time I'm exploring a dataset.
The main value proposition is that it unnests the code. Constantly moving logic into CTEs is another way to do it but comparatively it feels very clunky after a while. For example to inspect intermediate stages of a raft of CTEs, you end up having to write a different final clause. You also can't easily skip a stage. With pipes you just comment out the part of the query you don't want, at the bottom or in the middle, and you still have a syntactically valid query.

With today's pipe syntax CTEs are still needed to deal with teeing situations (multiple references to a single table) but for just a chain of transforms it is a lot simpler.