|
|
|
|
|
by crznp
1824 days ago
|
|
It seems to me that the basic problem is that binary trees are the wrong data type. For instance, you can transform the tree to balance it: p1 AND (p2 AND (p3 AND notp4)) -> (p1 AND p2) AND (p3 AND notp4)
But the abstraction is specifying the order of operations unnecessarily. Using general trees, I think you avoid the need to transform the order of operations and "NOT" doesn't have to be a special case. ALL (p1 p2 p3 NOT(p4))
Is there any reason to choose binary trees for this? (Other than inertia). |
|
That needs an "in SQL", the standard imperative language operator ordering has short-cut operations in there (a is null or a.value == true) etc.
In the code I work with, this actually sorts the conditions based on estimated selectivity[1] and type (long compares to constant are cheaper on a columnar data-set due to the SIMD, but string isn't etc).
> Is there any reason to choose binary trees for this?
The parse-tree does come off as binary because inserting logical parentheses makes it easier to tackle, because there are association rules which neatly go into a BinaryOp structure when dealing with operator precedence in parsing.
So it is easier to handle the parsing when you treat (a + (b + c) ) and (a / (b / c)) in similar fashion.
I won't make the same mistake again if I have to build a SQL engine, but this actually made the logical expression match the parse tree very closely and was a good enough generalization until the traversal time bugs[2] started to pop up.
[1] - https://github.com/apache/hive/blob/master/common/src/java/o... [2] - https://issues.apache.org/jira/browse/HIVE-9166