I appreciate that somebody somewhere may appreciate and enjoy this, but I am not that person. I love SQL. I have always loved SQL. I know why others don't, but I do and its beautiful.
While SQL is great, complex analytical queries are much easier to write, read and modify when they are written in pipe-based syntax because of the reasons outlined in the announcement:
- Pipe-based query is executed in clear sequential stages according to the pipes written in the query. SQL execution stages aren't clear enough.
- SQL requires mentioning GROUP BY fields twice - at SELECT and GROUP BY statements. This complicates modifying the query, since you need to modify the set of GROUP BY fields at SELECT too.
- There are no limits on the combination and the number of pipes, which can be used in a single query. There are various limits in SQL. For example, you cannot use WHERE after GROUP BY (there is HAVING workaround, but it looks unnatural). Another example - you cannot easily use GROUP BY multiple times with arbitrary additional calculations and transformations in the middle (of course, you can use subqueries as a workaround for these limitations, but this will complicate the resulting query too much).
It's insufficient to say that something is objectively better. Often times user preferences are multi-modal, and people cluster around groups. Pipe-based database query syntax may be objectively "easier" by some mathematical deduction, but by the laws of statistics > 50% of the userbase may prefer a different syntax.
this is SUPER interesting actually. I definitely didn't realize this. I'm still probably going to use a small army of CTEs for this sort of stuff, but I'm very interested to give this a shot the next time I'm exploring a dataset.
The main value proposition is that it unnests the code. Constantly moving logic into CTEs is another way to do it but comparatively it feels very clunky after a while. For example to inspect intermediate stages of a raft of CTEs, you end up having to write a different final clause. You also can't easily skip a stage. With pipes you just comment out the part of the query you don't want, at the bottom or in the middle, and you still have a syntactically valid query.
With today's pipe syntax CTEs are still needed to deal with teeing situations (multiple references to a single table) but for just a chain of transforms it is a lot simpler.
- Pipe-based query is executed in clear sequential stages according to the pipes written in the query. SQL execution stages aren't clear enough.
- SQL requires mentioning GROUP BY fields twice - at SELECT and GROUP BY statements. This complicates modifying the query, since you need to modify the set of GROUP BY fields at SELECT too.
- There are no limits on the combination and the number of pipes, which can be used in a single query. There are various limits in SQL. For example, you cannot use WHERE after GROUP BY (there is HAVING workaround, but it looks unnatural). Another example - you cannot easily use GROUP BY multiple times with arbitrary additional calculations and transformations in the middle (of course, you can use subqueries as a workaround for these limitations, but this will complicate the resulting query too much).
I recommend reading the following short migration guide from SQL to pipe-based LogsQL (query language optimized for typical queries over logs), in order to feel the difference - https://docs.victoriametrics.com/victorialogs/sql-to-logsql/