Hacker News new | ask | show | jobs
by okennedy 1370 days ago
We are... Spark's DataFrame is essentially a relational algebra AST-builder. Microsoft's LINQ interprets SQL directly in at compile-time. All of these, however, run queries more or less directly in the system in which they're specified.

It helps to think of SQL strings as an untrusted wire format. Yes, parsing is a pain, but it comes with two main benefits: (i) The wire format is human writable/interpretable, with all the accompanying benefits, and (ii) The wire format is easily extensible in a predictable way.

That latter one is particularly useful in keeping SQL's ecosystem open. Take a front-end library like SQLAlchemy or ScalikeJDBC for example. It's not practical for any one such library to support every extension provided by every database engine. SQL provides a fall-back for when you need a back-end feature that hasn't been implemented in any given front-end.

1 comments

C# LINQ does pass an expression tree into the abstraction before it will then serialize it SQL and then the database deserialize it into an AST. LINQ-to-Objects is in memory and works on the AST directly.

Also both LINQ language syntax and library methods are a builder paradigm for the expression tree. Valid, but still far from ideal representation of an AST.