Hacker News new | ask | show | jobs
by priceishere 74 days ago
An even simpler way imo, is explicit functions instead of a precedence table, then the code pretty much has the same structure as EBNF.

Need to parse * before +? Begin at add, have it call parse_mul for its left and right sides, and so on.

  parse_mul() {
    left = parse_literal()
    while(is_mul_token()) { // left associative
      right = parse_literal()
      make_mul_node(left, right)
    }
  }

  parse_add() {
    left = parse_mul()
    while(is_add_token()) { // left associative
      right = parse_mul()
      make_add_node(left, right)
    }
  }
Then just add more functions as you climb up the precedence levels.
3 comments

You lose in versatility, then you can't add user-defined operators, which is pretty easy with a Pratt parser.
You can have user-defined operators with plain old recursive descent.

Consider if you had functions called parse_user_ops_precedence_1, parse_user_ops_precedence_2, etc. These would simply take a table of user-defined operators as an argument (or reference some shared/global state), and participate in the same recursive callstack as all your other parsing functions.

Systemverilog has an operator precedence table with 16 levels.

https://www.academia.edu/figures/3550818/table-2-operator-pr...

Writing a recursive descent for this would require writing 16 functions, and you'd end up spending most of your time cycling through the functions to finally come across the one which applies for the given situation.

I've written straight-forward expressions parsers as you suggest, but when I had to do it for systemverilog, I used a classic shunting yard parser. You see the operator, compare its precedence against the stack and you know immediately what to do, vs possibly drilling down 16 levels of function calls to figure out what to do.

Another advantage of table-driven expression parsers is you can bail in error cases without needing to unwind countless levels of stack.

With a couple of function pointers you can climb precedence with just functions:

  parse_left_to_right(with(), is_token()) {
    left = with()
    while(is_token()) {
      right = with()
      left = operate(left, right, operator)
    }
    ret left;
  }

  p0() { ret lex digit or ident; };
  p1() { ret parse_left_right(p0, is_mul); };
  p2() { ret parse_left_right(p1, is_add); };
... and so on for all operators