Hacker News new | ask | show | jobs
by saurik 5013 days ago
Despite spending a lot of time both in the design Wiki and in the talk discussing the importance of being able to determine whether a / indicates a regular expression literal or a division operator entirely within the lexer (as opposed to using the parser, which is how JavaScript is generally defined), the algorithm that this developer implemented does not actually work.

First off, an example where it works:

    a
    /5/
    7
If you run this through sjs you get:

    a / 5 / 7;
This is because, in JavaScript, statements continue across line boundaries until they are either explicitly terminated by a semicolon or a syntax error, in which case the parse is retried at that point as if a semicolon had been provided. In this case, that means we have a single statement that is a division of these three expressions: a, 5, and 7.

However, let's take a more difficult case:

    a = function() {}
    /5/
    7
This is also a single statement: you are entirely allowed to attempt to divide a function literal by a number, you will simply get the value NaN as output. If you take this file and run it through node, adding a "console.log(a)" to the end, that is in fact what you will get: NaN. However, when first run through sjs, you instead get "[Function]".

The reason is that sjs translated the code to:

    a = function () {
    };
    /5/;
    7;
This is incorrect, and demonstrates how difficult some of these underlying issues are when parsing languages that have intertwined lexer and parser state. :( Attempting some other test cases involving regular expressions (but not semicolon insertion) also failed: it seems a lot more work will need to be done on this before it will be able to process general input (and it is not 100% clear to me that the shortcut required is even possible: I haven't thought enough about it yet to say for certain, however).

(I work on the JavaScript parser for a compile-to-JS language used by people doing jailbroken-iOS development for live introspection of running processes, and thereby that was the first thing I was interested in: how well the parser worked. ;P I have intentions to add reader macros, and then replace all of the extra Objective-C syntax I added with them, but I haven't gotten around to it yet. FWIW: I actually found and fixed a bug in my parser while writing this comment. ;P)

1 comments

Yeah there are certainly a few bugs remaining in the reader :)

It actually does the right thing if the function is named:

    a = function foo() {}
    /5/
    7
correctly translates to:

    a = function foo() {
    } / 42 / 7;
But clearly I missed the unnamed case. You mentioned finding a few other bugs? Would you mind submitting a bug report on github? I would love to fix those too!
Sorry, was distracted by the other conversation with dherman, and really shouldn't be spending any time on this anyway, but I've verified your operator associativity is wrong.

To start with some example code:

    for (var a = 7 in /7/ in 9 in b);
If I run that in node I get:

    TypeError: Cannot use 'in' operator to search for '/7/' in 9
As this code is vaguely equivalent to:

    var a = 7; for (a in ((/7/ in 9) in b));
However, it gets converted by sjs to:

    for (var a = 7 in (/7/ in (9 in b)));
That's obviously different, and gives this error instead:

    ReferenceError: b is not defined
(I really should get back to actually doing my job now, though; if dherman responds again I'll totally notice and follow up: that conversation is really interesting to me.)
Thanks for taking the time to write these up! I'm tracking them here [1]. The first two should be fixed and I should have the third ready soon.

[1] https://github.com/mozilla/sweet.js/issues/18

function a() { return /7/; }