Hacker News new | ask | show | jobs
by senorrib 82 days ago
Reasons 1-3 could very well be done with ClickHouse policies (RLS) and good data warehouse design. In fact, that’s more secure than a compiler adding a where to a query ran by an all mighty user.

Reason 4 is probably an improvement, but could probably be done with CH functions.

The problem with custom DSLs like this is that tradeoff a massive ecosystem for very little benefit.

2 comments

As long as you don't deviate too much from ANSI, I think the 'light sql DSL' approach has a lot of pros when you control the UX. (so UIs, in particular, are fantastic for this approach - what they seem to be targeting with queryies and dashboards). It's more of a product experience; tables are a terrible product surface to manage.

Agreed with the ecosystem cons getting much heavier as you move outside the product surface area.

Personally I think that's worse. SQL - which is almost ubiqutous - already suffers from a fragmentation problem because of the complex and dated standardization setup. When I learn a new DBMS the two questions I ask at the very start are: 1. what common but non-standard features are supported? 2. what new anchor-features (often cool but also often intended to lock me to the vendor) am I going to pick up?

First I need to learn a new (even easy & familiar) language, second I need to be aware of what's proprietary & locks me to the vendor platform. I'd suspect they see the second as a benefit they get IF they can convince people to accept the first.

I actually 100% agree with your for a new DBMS and share your frustration with vendor-specific features and lock-in. At that level, it's often actively counterproductive for insurgent DBs - ecosystem tooling needs more work to interface with your shiny new DB, etc - and that's why we always see anyone who starts with a non-standard SQL converge on offering ANSI SQL eventually.

I think an application that exposes a curated dataset through a SQL-like interface - so the dashboard/analytic query case described here - is where I think this approach has value. You actually don't want to expose raw tables, INFORMATION_SCHEMA, etc - you're offering a dedicated query language on top of a higher level data product offering, and you might as well take the best of SQL and leave the bits you don't need. (You're not offering a database as a service; you're offering data as a service).

You’re right RLS can go a long way here. With complex RBAC rules it can get tricky though.

The main advantages of a DSL are you can expose a nicer interface to users (table names, columns, virtual columns, automatic joins, query optimization).

We very intentionally kept the syntax as close to regular ClickHouse as possible but added some functions.

> table names, columns, virtual columns

This sounds solvable with clickhouse views?

> automatic joins

Is this also not solvable with views? Also, clickhouse heavily discourages joins so I wonder how often this winds up being beneficial? For us, we only ever join against tenant metadata (i.e. resolving ID to name)

> query optimization

This sounds potentially interesting - clickhouse's query optimizer is not great IME, but it's definitely getting better