Hacker News new | ask | show | jobs
by tyingq 1887 days ago
I'd be careful with how much of a warm fuzzy the tool gives you. See this example from my other comment in the thread: https://news.ycombinator.com/item?id=26905880 If it were really looking at AST level data, that wouldn't have fooled it.

I suspect there would be similar issues with your example of ensuring no use of eval() in PHP. So it seems okay to keep your own developers informed, but I wouldn't use it, alone, to vet outside code. PHP has eval-like functionality buried in preg_replace(), assert(), and probably other places. This tool also doesn't seem to dig into namespaced "aliases".

1 comments

> If it were really looking at AST level data, that wouldn't have fooled it.

Semgrep does look at an AST; but that counterexample is not something you can "fix" solely by looking at an AST. You need actual Python-specific semantic analysis that knows that all "open" functions like 'print' come from the builtins module, and thus are bound to the same identifier. They're literally built into the implementation, it's not something you can "discover" from analyzing existing Python source. Even if you had a perfectly accurate python AST it couldn't "tell" you this fact, it's a priori knowledge, and all analysis engines need a base set of facts like this that they work from.

> but I wouldn't use it, alone, to vet outside code.

I mean, nobody seems to be suggesting this though, and the OP quite literally stated the major value of the tool is enforcing domain/codebase-specific rules among a team. Which is a really good use for it! There are tons of little useful patterns you can codify this way.

>Semgrep does look at an AST; but that counterexample is not something you can "fix" solely by looking at an AST

Perhaps I worded it poorly. Dumping the python AST for builtins.print() makes it pretty clear that it's "print" though. So I'm curious why that skirts the rule.

>I mean, nobody seems to be suggesting this though

Not specificially, but the context is using it for security purposes with phrases like "every use of a potentially injectable function such as exec,system,etc. in PHP". Felt like that was worth commenting on.

> Dumping the python AST for builtins.print() makes it pretty clear that it's "print" though.

No, the AST only tells you it's a method call on something called "builtins." You need the separate semantic knowledge of what builtins is in order to figure it out. Parsing + AST just means it sees "method call of `print` on `builtins` object". Regular print calls would come through as "regular function call of `print`".

> Perhaps I worded it poorly. Dumping the python AST for builtins.print() makes it pretty clear that it's "print" though. So I'm curious why that skirts the rule.

To echo the other replies, the AST for builtins.print() is the same as the ast for mymodule.print() and, in fact, if you stick a builtins.py in the right place, you'll be able to prevent the import of the standard library builtin module, while the ast's would be identical.