|
|
|
|
|
by okennedy
1370 days ago
|
|
We are... Spark's DataFrame is essentially a relational algebra AST-builder. Microsoft's LINQ interprets SQL directly in at compile-time. All of these, however, run queries more or less directly in the system in which they're specified. It helps to think of SQL strings as an untrusted wire format. Yes, parsing is a pain, but it comes with two main benefits:
(i) The wire format is human writable/interpretable, with all the accompanying benefits, and
(ii) The wire format is easily extensible in a predictable way. That latter one is particularly useful in keeping SQL's ecosystem open. Take a front-end library like SQLAlchemy or ScalikeJDBC for example. It's not practical for any one such library to support every extension provided by every database engine. SQL provides a fall-back for when you need a back-end feature that hasn't been implemented in any given front-end. |
|
Also both LINQ language syntax and library methods are a builder paradigm for the expression tree. Valid, but still far from ideal representation of an AST.