Hacker News new | ask | show | jobs
by crznp 1824 days ago
It seems to me that the basic problem is that binary trees are the wrong data type. For instance, you can transform the tree to balance it:

    p1 AND (p2 AND (p3 AND notp4)) -> (p1 AND p2) AND (p3 AND notp4)
But the abstraction is specifying the order of operations unnecessarily. Using general trees, I think you avoid the need to transform the order of operations and "NOT" doesn't have to be a special case.

    ALL (p1 p2 p3 NOT(p4))
Is there any reason to choose binary trees for this? (Other than inertia).
1 comments

> the abstraction is specifying the order of operations unnecessarily

That needs an "in SQL", the standard imperative language operator ordering has short-cut operations in there (a is null or a.value == true) etc.

In the code I work with, this actually sorts the conditions based on estimated selectivity[1] and type (long compares to constant are cheaper on a columnar data-set due to the SIMD, but string isn't etc).

> Is there any reason to choose binary trees for this?

The parse-tree does come off as binary because inserting logical parentheses makes it easier to tackle, because there are association rules which neatly go into a BinaryOp structure when dealing with operator precedence in parsing.

So it is easier to handle the parsing when you treat (a + (b + c) ) and (a / (b / c)) in similar fashion.

I won't make the same mistake again if I have to build a SQL engine, but this actually made the logical expression match the parse tree very closely and was a good enough generalization until the traversal time bugs[2] started to pop up.

[1] - https://github.com/apache/hive/blob/master/common/src/java/o... [2] - https://issues.apache.org/jira/browse/HIVE-9166