Hacker News new | ask | show | jobs
by CodeBrad 1893 days ago
I agree, good code coverage can probably get most of the way there. But there are some problems with relying on coverage.

Coverage can be misleading. Maybe there is some function being executed in parallel:

  int global;
  void foo() {
    int local;
    
    if (complex_condition)
      bar(&local);
    else
      bar(&global);
  }
And assume maybe somewhere way down the call chain the data passed to bar is modified. The bar function and all of the functions called in bar can have 100% test coverage, but if the else branch is not covered, the race will be missed.

So without true 100% test coverage, it is possible races are being missed by dynamic tools, and true 100% test coverage is probably not possible or feasible in large complex projects.

1 comments

> but if the else branch is not covered, the race will be missed.

Yes.

> true 100% test coverage is probably not possible or feasible in large complex projects.

I have yet to encounter a single large complex project where 100% test coverage is not possible. I have encountered such projects where 100% test coverage required massive refactoring of the code to facilitate testing and doing so resulting in fixing most customer complaints received.

So you truly believe that Firefox or Windows or Ubuntu could not only be 100% covered by unit tests, but that it would actually be worth taking the time to do so? (edit: typos)
As someone who has worked on the code of Firefox, I strongly doubt that.

For instance, pretty much every piece of code (whether native or JS) I've seen is programmed defensively, assuming that for some reason invariants can be broken - something that typically cannot happen unless there is a memory corruption involved. Testing all these defensive programming branches is not possible without introducing memory corruptions.

Yes, I firmly believe that Firefox, Windows, and Ubuntu can achieve 100% unit test coverage of software. I truly believe it would be worth its while.
As Yoric mentioned, that would require some custom Qemu patch to corrupt memory, some test boxes with dodgy RAM, and/or other things along those lines.

Removing those "impossible" code branches is a bad idea. "Impossible" things happen millions of times a day to Firefox.

I used to work for LimeWire, which had tens of millions of users. We were sometimes seeing bug reports of "impossible" stack traces. They went down dramatically when we added a splash screen that calculated SHA-1 hashes of all of the DLLs and JARs before using the reflection API to load and start the actual application. Given that the "corrupted" JARs were still presumably passing the JARs' CRC-32 checksums and bytecode verification at class load time, I think we were most often just detecting dodgy RAM that would later have caused "impossible" errors by corrupting bytecode or native code after bytecode verification.

Writing defensively is great! There's nothing wrong with that.

If you're writing code to defend against "dodgy RAM" whose symptom is a corrupt SHA then it's super trivial to test that code: have a valid JAR file and adjust its content so that it will still "load" but isn't the file you expect. Then verify that the program refuses to load the JAR.

> have a valid JAR file and adjust its content so that it will still "load" but isn't the file you expect. Then verify that the program refuses to load the JAR.

Sorry for being a bit slow today. but what's the difference between "load"ing and loading? You clearly understand that you're using two different definitions of load, otherwise you wouldn't have put one in quotes. (Also, it would be self-contradictory if the two were identical.) However, I can't for the life of me figure out two different definitions for "load" which both involve CRC-32 and bytecode verification checks, and yet aren't identical.

If I'm correct, we saw a dramatic reduction in the "impossible" errors not because we were detecting the actual problem that caused the errors (corruption of internal JVM buffers after bytecode validation), but instead because our SHA-1 failures (loading chunks of files into our own buffers and checksumming those) were highly correlated with the "impossible" failures because they had the same underlying cause.

In our case, the JVM loaded the code from the JAR, which, unless I misunderstand Sun/Oracle's classloader, means that both CRC-32 checksums and bytecode verification pass, even for unsigned JARs. The chances of random bit flips passing CRC-32 checks are 4 billion to one. The odds of random bit flips passing both a CRC-32 check and bytecode verification are probably at least quadrillions to one. I'm pretty sure the corruption was happening minutes to hours after the classes actually got loaded by the JVM.

In other words, I think we were getting saved by SHA-1 summing a bunch of large files being a poor-man's memtest86: writing semi-random-ism patterns to big chunks of memory and verifying you can read back what you wrote. Failing this correlates heavily with seeing "impossible" errors at runtime, but we weren't directly examining the classloader's internal buffers that were the actual cause of the errors.

On a side note, modern OSes should really have a low rate memory checker built into the kernel's memory manager. It's a crutch, but cheap consumer IoT devices and WiFi APs aren't getting ECC any time soon.

Even disregarding tne very good points about code defending from memory errors mentioned by others, I imagine a realistic time frame for such coverage for a project the size of Firefox or Windows to be no less than a decade, with 0 other development going on. I find it absurd to imagine it would be worth it, even if it reduced bug count to 0 (which it wouldn't).