Hacker News new | ask | show | jobs
by johnny-lee 2506 days ago
I've gone down this road years ago.

While there's no install and initial results are quick to appear, the false positives that grep or any string search tool generates will make the cynics shoot down this simple attempt to find problems in the source code.

Problems that arose:

- what about use of those questionable APIs/constants in strings (perhaps for logging) or in comments?

- some of the APIs listed in the article were only questionable when certain values were used - sometimes you can get grep/search tool of choice to play along, but if the API call spans multiple lines or the constant has been assigned to a variable that is used instead, then a plain string search won't help.

- it's hard to ignore previously flagged but accepted uses of the API/constants.

- so there's a possible bug reported, but devs usually want to see the context of the problem (the code that contains the problem) quickly/easily. Some text editors can grok the grep output and place the cursor at the particular line/character with the problem, some can't.

If you go down that road to try and reduce false positives, you'll end up with a parser for your development language of choice.

1 comments

I haven't tried this approach, but having spent years using one of the best commercial SAST tools, I'm reluctant to dismiss it too quickly.

My SAST generates tons of false positives and is unforgivably slow. If this is orders of magnitude faster, it might be worth the extra false positives.

As a side note, my dream is a SAST that comments directly in the PR like a human reviewer would. Maybe that exists?

The SAST program is probably doing a lot more than a string search tool does.

If the SAST has to process C/C++ source code, then the SAST will parse all the #include'd header files. The SAST may track values to determine if illegal/uninitialized values are used.

A string search tool will skip doing all of that.

If the class of problems you're looking for contains only bad functions/constants, then a string search tool may be fine.

But as I mentioned before, the string search tool may get confused if these bad strings occur in strings/comments/irrelevant #if/#else/#elif sections.

There are another class of bugs dealing with data values which a string search tool can't deal with easily.

As an example, PC-Lint lists the type of problems the program may flag - https://www.gimpel.com/html/lintchks.htm. A string search tool won't know about classes and virtual destructors or other concepts relevant to the programming language in question.

For the string search tool, you'd either invoke the search string tool several times with different search strings for the same source code or slightly more efficient, have one long search string containing all your search strings as alternate search targets for the string search tool.

Either case, when the string search tool spits out a positive result, it won't explain why there is a problem. The dev will have to know or lookup the problem associated with that search result.

When I worked on this area, C/C++ compilers stopped at syntax errors. Most have gotten better at flagging popular problems like variable assignments within if statements, operator precedence bugs, and printf-format string bugs.

Some divisions at Microsoft required devs to run a lightweight SAST before committing changes to locate possible problems ASAP.

It's relatively easy to integrate an SAST into your build system to scan the modified source code just before you're ready to commit the changes.