Hacker News new | ask | show | jobs
by alex_smart 481 days ago
> Squashing any error, strangeness and warning can be very expensive in some projects

Strongly disagreed. Strange, unexpected behaviour of code is a warning sign that you have fallen short in defensive programming and you no longer have a mental model of your code that corresponds with reality. That is a very dangerous to be in. Very quickly possible to be stuck in quicksand not too far afterwards.

4 comments

Depends a lot on the project, I think, as the parent comment suggests.
I feel like these categories are different. Warnings should generally be treated as errors in my book, and all errors should be corrected. But "strangeness" is much more open ended. Sometimes large systems don't behave quite as expected and it might make sense to delay a "fix" until something is actually in need of it. If none of your tests fail then does it really matter?
> If none of your tests fail then does it really matter?

Yes. Absolutely.

You don't believe your software is correct because your tests don't fail. You believe your software is correct because you have a mental model of your code. If your tests are not failing but your software is not behaving correctly, that your mental model of your code is broken.

I agree for small systems. But as they get larger you often can't keep track of every last piece simultaneously. It can also become quite involved to figure out why a relatively obscure thing happened in a particular case.

Consider something like Unreal Engine for example. It's not realistic to expect to have a full mental image of the entire system in such a case.

At least in theory the tests are supposed to cover the observable behavior that matters. So I figure if the tests pass all is well. If I still find something broken then I need to add a test case for it.

> But as they get larger you often can't keep track of every last piece simultaneously

Sure, but then you divide the larger system into smaller components where each team is responsible for one or few of these individual pieces and the chief architect is responsible for making sure of how the pieces are put together.

> At least in theory the tests are supposed to cover the observable behavior that matters. So I figure if the tests pass all is well. If I still find something broken then I need to add a test case for it.

But you sure as hell hope that the engineer working on implementing your database has a decent mental model for the thread safety of his code and not introduces subtle concurrency bugs because his tests are still green. You also hope that he understands that he needs to call fsync to actually flush to data to disk instead of going yolo (systems never crash and disks never fail ). How are you supposed to cover the user observable behavior in this case? You cut off the power supply to your system/plug off your disk while writing to the database and assert that all the statements that got committed actually persisted? And how many times you repeat that test to really convince you that you are not leaving behind a bug that will only happen in production systems say once every three years?

I am only giving database and multithreading as examples because they are the most obvious, but I think the principle applies more generally. Take the simplest piece of code everyone learns to write first thing in uni, quicksort. If you don't have a sufficient mental model for how that algorithm works, what amount of tests will you write to convince yourself that your implementation is correct?

> then you divide the larger system into smaller components where each team is responsible for one or few of these individual pieces and the chief architect is responsible for making sure of how the pieces are put together

And then you have ravioli code in the large. It is not going to make it easier to understand the bigger system, but it will make it harder to debug.

https://en.wikipedia.org/wiki/Spaghetti_code#Ravioli_code

If you can reproduce an error, you can fix it. Do that.

If you cannot reproduce it after a day of trying and it doesn’t happen often, don’t fix it.

Off topic but when I first saw ravioli code it was in a positive light, as a contrast to lasagna code. But then somewhere along the line people started using it in a negative manner.

There is some optimal level of splitting things up so that it's understandable and editable without overdoing it on the abstraction front. We need a term for that.

Probably not the best examples.

Sqlite famously has more test related code than database code.

Multithreading correctly is difficult enough that multiple specialized modeling languages exist and those are cumbersome enough to use that most people don't. In practice you avoid bugs there by strictly adhering to certain practices regarding how you structure your code.

You mention fsync but that famously does not always behave as you expect it to. Look up fsyncgate just for starters. It doesn't matter how well you think you understand your code if you have faulty assumptions about the underlying system.

Generally you come across to me as overconfident in your ability to get things right by carefully reasoning about them. Of course it's important to do that but I guarantee things are still going to break. You will never have a perfect understanding of any moderately large system. If you believe you do then you are simply naive. Plan accordingly (by which I mean, write tests).

I highly doubt anyone has a mental model of the all the code they're working with. You very often work with code that you kind of understand but not fully.
I obviously meant the code that you own and are responsible for.
Same thing there.
And/or so are your tests.
No, they are not. They are a cheap way of verifying that something hasn't gone wrong, not a proof of correctness.

Tests failing implies the code is incorrect. Tests not failing does not imply that the code is correct.

> Tests not failing does not imply that the code is correct.

I don't think that's what's being suggested. Tests not failing when your code does implies that you are missing test cases. In other words things are underspecified.

Haskell is the extreme example of this. If it successfully compiles then it most likely does exactly what you intended but it might be difficult to get it to compile in the first place.

>Tests not failing when your code does implies that you are missing test cases. In other words things are underspecified.

I am really confused. Have you guys never written any multithreaded code? You can write the most disgusting thread-unsafe code without a single lock and be perfectly green on all your tests. And who in the world can write tests to simulate all possible timing scenarios to test for race conditions?

I give multithreading as just the most egregiously obvious example that this "tests can prove correctness" idea is fundamentally broken, but I think it applies more generally.

>Haskell is the extreme example of this. If it successfully compiles then it most likely does exactly what you intended but it might be difficult to get it to compile in the first place.

Absolutely 100% of the safety of haskell comes from the mental model (functional programming, immutable data structures etc) and none from the test cases (although their community appears to even do testing slightly better than others).

I'm not saying that passing tests proves the code is correct, I'm saying that if you find a problem with the code that your tests don't pick up, then you should add a test for it.
100% this. If our product does something unexpected, finding out why is top priority. It might be that everything’s fine and this is just a rare edge case where this is the correct behaviour. It might be a silly display bug. Or it might be the first clue about a serious underlying issue.
I mean, it is expensive. It’s just that the alternative might be more so.
> might be

Yes.

Or it might be cheaper.