3 percent of Python codebases we checked had silently failing unit tests | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	3 percent of Python codebases we checked had silently failing unit tests (richardtier.com)
	98 points by rikatee 1582 days ago

14 comments

misnome 1582 days ago

We work with a rather large upstream scientific codebase in python. They have an innate distrust of anything that isn't written by them.

Their testing system depends on tests printing "OK" after every test. This means that in many cases, tests failing are indicated by the _absence_ of "OK" being printed.

(We've attempted to isolate those parts and write our own stuff testing against upstream in pytest. We once presented a proposal to move them to pytest, offering to do any work and even wrote pytest plugins to seamlessly integrate with their current system. We got a - literal - "Thanks, but no thanks.")

chriswarbo 1582 days ago

> Their testing system depends on tests printing "OK" after every test.

Oof. If they'd instead put "ok" before every test, they might have been accidentally compatible with TAP! https://testanything.org

rightbyte 1582 days ago

I think relying on stout is kinda fine. I do tests with TCL/Expect like that.

It is nice to not have to depend on the language runtime to do the test.

danuker 1582 days ago

There is a point where enough mismanagement makes relying on the said scientific codebase a liability rather than an advantage.

You could even be a victim of "Embrace, Extend, Extinguish".

My advice is to consider forking it, and poaching contributors, in the interests of common good.

adenozine 1582 days ago

And you’re unwilling to name the library?

I’d love to investigate this further.

popularonion 1582 days ago

That’s not the way I would do it, but is there really a problem here? Assuming the library itself isn’t littered with print statements that cause false positives.

Experience has taught me that the “right” testing framework for a project is whatever the developers are happy and productive with.

oli5679 1582 days ago

Did they catch exceptions and then print 'not ok'?

danuker 1582 days ago

> tests failing are indicated by the _absence_ of "OK" being printed

I guess they are counting OKs then bisect their test suite until they find the "not ok" test.

byteface 1582 days ago

I just grepped some sitepackages and saw this. Is it an example of what the author is saying?... https://github.com/gitpython-developers/GitPython/blob/main/...

If so that would appear common as came up right away.

rikatee 1582 days ago

yes that's exactly the kind of problem :)

forgotusername6 1582 days ago

I've had unit test suites in the past that failed to run the test if the test failed to compile. Those were the worst. I only found out because I roughly knew how many tests I expected to be run

ianbicking 1582 days ago

During development I sometimes end my tests with assert 0, and only once I get to that failure do I know I've finished

dc-programmer 1582 days ago

I’ve shipped unit tests that would fail but don’t run by copying the function signature of the previous test then forgetting to change the name of the new one. Only one of the tests will run (the first?).

Maybe unit tests need unit tests? (There’s probably a lint rule to catch what I describe above)

jka 1582 days ago

> Maybe unit tests need unit tests? (There’s probably a lint rule to catch what I describe above)

Yep - meta-testing (ensuring that every unit test that exists in a project adds unique coverage, remains valid, runs as expected, and I'm sure many other properties) could (and should!) definitely be automated.

Some more advanced meta-testing could involve tracking changes to a project's source history over time (in other words: tests that run with commit history). By that I'm thinking of situations like: "does this test genuinely still test what it used to, after the test and/or application code was modified?"

rcxdude 1581 days ago

mutation testing is one example: if you make random changes (random in terms of transforming valid code to different valid code) to the code being tested, you should expect that the test will then fail. If not, there is some part of the code's behaviour which is not being tested.

dc-programmer 1582 days ago

Wow, that sounds like the future of testing. It’d be a hard to sell to manager now though. Some of those checks seem like they could be auto-generated though

pure_simplicity 1582 days ago

The second one will run, because as the file gets executed top down, the second declaration overwrites the first declaration, just like when you reassign a variable.

But yeah that would be a good thing for a linter to catch. I'm not aware if any do.

njharman 1582 days ago

Huh? Seems super clear to me that "assertTrue" is asserting Truthyness and not equality. It's right there in the method name! And if you don't know "True" means Truthyness in Python, they you don't know the basics of Python.

A reviewer should catch this error easily. I kind of think many don't give much attention to unittests when reviewing. Which is bad. Good unittests are far harder to write than good code.

There's much more subtle errors of this class (False Negatives / always pass).

js2 1582 days ago

Given that this is a not uncommon mistake, despite the name, indicates that people make mistakes. They don't read. They're in a hurry. They see what they want to see.

The fix isn't to blame people for making mistakes. It's to figure out a design that doesn't allow this mistake to happen in the first place.

For example, the method could (today) require the second argument to be a keyword argument. This is also something a good linter should be able to warn on.

edit: rikatee and I wrote essentially the same reply at the same time. :-)

rikatee 1582 days ago

Agreed in perfect world, but unfortunately any process that involves humans will involve human error.

We do code review because we expect human error when the code was written by a human, but then we also expect not human error when the code is being read (reviewed) by a human? Any process that expects zero human error will always fail.

That's where linters add value: they allow devs to do what humans are good at (the creative complex and interesting stuff) while the bots do what bots are good at (the boring repetitive stuff)

krab 1582 days ago

Why Shroeder's test? Does it have anything to do with Gerhard Schroeder? I fail to see what. :-)

Retr0id 1582 days ago

I wonder if they meant Schrödinger - the test could be both passing or failing, but we don't know until we use the correct function to check the results.

tialaramex 1582 days ago

Note that Schrödinger's thought experiment is intended to ridicule this way of thinking. Schrödinger is trying to suggest that since it's clearly nonsensical to imagine that maybe a whole cat can be both dead and alive the same would be true for other macroscopic subjects.

Instead popular culture has decided that at best, this is what Schrödinger believed (Ha those crazy scientists) and at worst that somehow the cats being dead and not-dead at the same time is the core idea of quantum physics :/

3pt14159 1582 days ago

Writing software for a long time really changes your perspective on things. It no longer seems weird that "the" cat is both alive and dead. It's just compression between two universes and until we open the box our perspective is the same.

marcosdumay 1582 days ago

> and at worst that somehow the cats being dead and not-dead at the same time is the core idea of quantum physics

Yet people are keeping larger and larger objects in a coherent state. Probably nobody will ever do it with a cat, but quantum physics is keeping its tradition of taking anything people think as absurd and saying "well, not really, look at this".

rikatee 1582 days ago

you're right thanks! Updated :)

carpenecopinum 1582 days ago

That typo/misread (Schroeder instead of Schrödinger) honestly almost causes it to make more sense to me. Because I don't really see how those tests relate to quantum mechanics. Instead comparing those tests to a chancellor of the local labor party that was expected to help the situation of workers in the country, but essentially only used the office as a stepping stone to become a russian oligarch, making situation even worse in the process, makes plenty of sense to me...

rikatee 1582 days ago

oh dear time for a edit as I did indeed mean Schrödinger!

ajuc 1582 days ago

I thought you meant the villain from ninja turtles ;)

rikatee 1582 days ago

deep ninja turtles lore

contravariant 1582 days ago

Maybe it's supposed to be Ernst Schroeder? Though I don't see the immediate connection.

generic-husky 1582 days ago

Nobody's going to talk about the cute fursona?

rikatee 1582 days ago

we were all waiting for you to

dragonwriter 1582 days ago

Though it's unlikely to get made in the existing testing library because it's hugely breaking, the API would be better if the assertXxx methods’ optional message argument were keyword-only, and assertTrue (and assertFalse) were replaced with assertIsTrue and assertTruthy (and assertIsFalse and assertFalsey.)

robertlagrant 1582 days ago

Good lord I forgot how ugly pre-pytest was.

hyperzeit 1582 days ago

How come, pytest does not appear in the post?

rikatee 1582 days ago

the post covers the built-in unittest package, which 28% of devs still use. But pytest is nicer to work with. I think brownfield codebases and inertia are the reason 28% of devs work (or have to work) with unittest

KptMarchewa 1582 days ago

Pytest easily runs unittest codebases, and you can just start writing new tests in pytest, and gradually move to it. Most of those left in pure unittest land are probably in some category of "amount of legacy is too large" or "I don't care anymore", and most probably some amount of both.

thraxil 1582 days ago

I actually prefer unittest style tests to pytest. I hop between languages a lot and find them easier to remember how to write when I'm doing Python. I also place a lot of value on minimizing the dependencies that I have to install and every codebase I see using pytest seems to also have to pull in a dozen other pytest plugins that then have to be reviewed, pinned, and updated. I also feel like whatever niceties pytest brings in to make writing tests easier are balanced out on Django apps by having to add a `@pytest.mark.django_db` decorator on basically every single test function.

ginja 1582 days ago

Regarding Django, if you add `pytestmark = pytest.mark.django_db` to the top of your file or organize your tests in classes and decorate those, then you won't have to decorate every single test :)

KptMarchewa 1582 days ago

I mean, pytest ones are the easiest. They are just functions. If you're not doing anything fancy, then you don't need to do anything else. Use normal assert, not some fancy functions. Only plugin I really use is coverage.

Minimizing runtime dependencies is nice, but personally I couldn't care less about build/test time dependencies.

I don't touch Django so can't comment on that though.

teddyh 1582 days ago

unittest is included in the Python standard library. Adding third-party libraries is a huge step to take for a project, and just “nicer to work with” does not cut it. Third-party libraries come and go, and depending on one means being subject to the storms of changes and lulls of inactivity and death. But the standard library is dependable.

njharman 1582 days ago

> Adding third-party libraries is a huge step to take for a project

Really? I've can't think of a "real" project that did not include 3rd party libs.

Reinventing wheels or struggling with poorer implementations (unittest vs pytest, http vs requests, etc) is huge drain for a project. A huge misstep.

teddyh 1582 days ago

I should have written “Adding each extra third-party library as an additional dependency is always a huge step to take for a project”.

Any one third-party library comes with these drawbacks, and each library must be evaluated individually. Some may be worth the pain (requests, etc.), but many are not. One should try to minimize the number of third-party dependencies one has, not necessarily eliminate them entirely. It’s simply that the lower number of third-party libraries you depend on, the less pain you get of the kinds I listed. Every individual third-party library will have to overcome that threshold by being useful enough. And I doubt that, for most people, pytest is that much better to work with than the built-in unittest is.

boxed 1582 days ago

This (and so much more) will be caught by mutation testing. For python that means mutmut.

pc86 1582 days ago

I'm no Python expert, but why does assertTrue() even accept two arguments?

js2 1582 days ago

Explained later down in the post:

> assertTrue also accepts a second argument, which is the custom error message to show if the first argument is not truthy. This call signature allows the mistake to be made and the test to pass and therefore possibly fail silently.

justinsaccount 1582 days ago

The 2nd argument is the 'msg' argument.

With modern features in python you could change the signature to

  assertTrue(expr, *, msg=None)

which would prevent that issue.

d0mine 1582 days ago

or just:

    assert expr, "custom message"

though given the verbose api, it is ok to require the explicit msg kwarg (duplication in the tests is ok if it makes them more robust)

guilherme-puida 1582 days ago

The second argument is the message that is displayed if the value is not truthy.

cyberia23424 1582 days ago

Second arg is a string message to display when test fails

6LLvveMx2koXfwn 1582 days ago

Indeed, from TFA;

"assertTrue also accepts a second argument, which is the custom error message to show if the first argument is not truthy. This call signature allows the mistake to be made and the test to pass and therefore possibly fail silently."

charcircuit 1582 days ago

Does unittest cause an exception when the second argument isn't a string? That would catch some of these I'd imagine.

lozenge 1582 days ago

They should make `msg` a keyword-only argument on all the TestCase.assert* statements. It wouldn't be the first breaking change of this type.

Searching for announced breaking changes about arguments to Python included functions...

https://bugs.python.org/issue25628

https://bugs.python.org/issue29193

https://bugs.python.org/issue36492

https://bugs.python.org/issue29209

rikatee 1582 days ago

it does not, it just prints them out in the console

dingosity 1582 days ago

well. yeah. they were probably written for 2.x or 3.y and being run in 2.(x+n) or 3(y+n).

dcdc123 1582 days ago

Only 3 percent?

rikatee 1582 days ago

of the codebases checked yep (20 of 666 checked).

Bear in mind only 28% of codebases actually use built-in unittest package that this gotcha is affected by, so really it's 20 of 28% of 666 aka 10% ... but that claim would be hard to justify by folks that dig stats.