Hacker News new | ask | show | jobs
by zX41ZdbW 1238 days ago
There were countless attempts to extend or replace SQL:

OQL: https://en.wikipedia.org/wiki/Object_Query_Language UnQL: https://www.dataversity.net/unql-a-standardized-query-langua...

More modern:

PRQL: https://prql-lang.org/ Malloy: https://news.ycombinator.com/item?id=30053860 (it is so obscure that Google replaces it to "Malay language")

Another example: ClickHouse supports standard SQL with many features such as window functions, SQL/JSON, and extends it to make it convenient for data analysts by adding: - higher-order functions; nested data structures, arrays, tuples, and maps; aggregate function states as first-class citizens, unrestricted usage of aliases in any place of expression, etc.

https://github.com/ClickHouse/ClickHouse/

I'm an everyday user of ClickHouse, and I'm finding its SQL implementation the most pleasant to use! Although it's unsurprising, because I'm also one of its authors... I'm also welcoming innovation and improvement of SQL without the introduction of a completely different language.

2 comments

ClickHouse is one of those technologies I've had half an eye on for a while.

In your completely unbiased opinion (I kid), do you think it's a good choice for the following problem?

I have multiple sensors that read different types of data about the same subject at (annoyingly) slightly different intervals, usually a few dozen times a second. This needs to be combined with other event data that happens on the order of a few times per day.

Currently I analyze this data in Python, R, and sometimes SAS (a weird proprietary language). Some coworkers use Matlab.

Is that a ClickHouse problem? If I tried it out would the ClickHouse community be interested in hearing how it goes?

Looks like a good scenario for ClickHouse.

One option is to just record all the measurements with the corresponding time. Something like a table with:

  sensor_id, time, value
To align and correlate the measurements, simply round down the time to some bucket. Do something like

  SELECT toStartOfMinute(time) AS t, anyIf(value, sensor_id = 'X'), anyIf(value, sensor_id = 'Y')
  FROM measurements
  WHERE sensor_id IN ('X', 'Y')
  GROUP BY t
  ORDER BY t
Yes, I'm interested in, how it will go! milovidov@clickhouse.com
Another interesting option for correlation of measurements at uneven intervals is - using ASOF JOIN.
PRQL was what I was thinking of. Problem is integrating PRQL into say rails.