Hacker News new | ask | show | jobs
by concise_unicorn 3598 days ago
Regular expressions are a very naive way of detecting calls to 'require'. For correctness you're better off recursively walking the AST.

I've successfully used Detective in a couple of my personal projects to find all require statements.

Relevant issue on Detective: https://github.com/substack/node-detective/issues/8

3 comments

In practice regex gets all the require calls. And in many situations the very small percent of errors is outweighed by the huge advantage in performance that a simple regex provides.

So i wouldn't call regex a 'very naive' choice, since people making it are aware of tradeoffs and pick regex intentionally.

It literally takes about 50ms to parse libraries like jQuery (mobile), angular and React with acorn[1].

Wether the choice was deliberate or not is debatable, but given the speed of these parsers I'd reason that there isn't any advantage to using regexes.

Reducing the amount of false positives is also one step closer to making this tool somewhat more secure, though certainly doesn't address any of the previous comments in this thread.

[1] http://esprima.org/test/compare.html

Good point, let me look into that.
The AST? What AST is that? You don't get access to an AST.
Tools like acorn[1] and Esprima[2] can parse the JavaScript source and output an ESTree compliant syntax tree.

Then you can traverse it and modify it like any other AST.

[1] https://github.com/ternjs/acorn [2] http://esprima.org/ [3] https://github.com/estree/estree

The AST you get from any of a wide variety of parsers, of which esprima may still be only the most popular of many.
It has to open and read a .js file already, it can certainly turn that into the representative AST for said file and then use the data from that. It will be slower, but it will also be more accurate and less likely to turn up false positives or miss things.
It has to parse a javascript file, which isn't trivial. The reason they use regular expression is because implementing a javascript parser isn't an easy problem to solve fast, even though the grammar is available.
One certainly does not need to implement a parser, just use one of the many available. As the sibling here pointed out the time it would take to parse and walk the AST is negligible compared to the downloads happening.
You downloading megabytes over network and performing disk IO with it. What's the problem with 100 milliseconds of file parsing?