Hacker News new | ask | show | jobs
by jdp23 4450 days ago
Interesting approach! Kudos to Coverity for jumping on it so quickly.

Taint analysis is notoriously prone to false positives; as well as the reasons listed in this post, there are many situations where relations between variables mean that tainted data doesn’t cause problems. [For example, the size of the memcpy target (bp) is known to be greater than payload; so even though payload is tainted, there isn't a risk of a write overrun.] But even noisy warnings can be very useful — when we first implemented simple taint analysis in PREfix a decade ago, the first run was 99% false positives but one of the real bugs we found was in a system-level crypto module. So with the increased attention to these kinds of bugs after Heartbleed, seems like a great time for more attention to these classes of bugs.

2 comments

>first run was 99% false positives but one of the real bugs we found was in a system-level crypto module.

Thinking of it as a false positive seems like the wrong perspective. The static analyzer is a tool that flags usages that are not proven to be correct. The fact that it turned out to be valid isn't the issue, the issue is that your code did not prove it valid to the satisfaction of the analyzer. This isn't necessarily a failing of the analyzer, but an indication that your code should be written in a different way, or provided more "evidence" that its correct (i.e. if guards/size checks).

The goal should be to write code in such a way that whatever tool you're using can prove it correct. Sure, the better the tool the easier this process is. But we really need to fundamentally rethink how we approach this problem.

I was once talking to someone with a lot of experience in this field and he said that false positives were one of their biggest problems. If you have too many false positives programmers end up deciding the analyzer is full of crap and either dismiss the results entirely or gloss past many ultimately useful results.

A static analyzer that will actually be used can't have too many false positives, and this is the big challenge with these things. He said that allowing some false negatives (to cut down on false positives) made the tools more effective in actually solving problems.

That said, with something like openSSL, you do sort of just wish the programmers would deal with it. Language design should include elements to make these sorts of static analyses easier.

That's an interesting idea, to have a language and a static analyzer created for each other simultaneously. Constructs that are hard for a static analyzer to reason about would be left out (or relegated to an "unsafe" context). I wonder if there's any work regarding seeing how well rust holds up to the state of the art in static analysis?
I think it works even better if you can get help from the type system. For example, the SafeHtml interface in GWT [1] gives you some safety from Java's type checking and can also make additional static analysis easier. (Then it becomes an exercise in making sure the API is used as intended.)

Perhaps something similar could be done using typedefs in C?

[1] http://www.gwtproject.org/javadoc/latest/com/google/gwt/safe...

Typedefs in C are just aliases, so given `typedef int foo;` one can freely use `int`s and `foo`s interchangeably, i.e. no checking by the compiler.

That said, one could use actual wrapper structs around the various types.

A static analysis framework could treat certain typedefs specially since it's parsing the C code anyway.
Great idea! Too bad openssl uses #define instead
There's also sparse[0]'s address space annotation, which Linux uses for annotating data from userspace.

[0] https://git.kernel.org/cgit/devel/sparse/sparse.git/