Hacker News new | ask | show | jobs
by the_sleaze9 911 days ago
Good story.

I for one do not believe in Unit Tests and try to get LLM tooling to write them for me as much as possible.

Integration Tests however, (which I would argue is what this story is actually praising) are _critical components of professional software. Cypress has been my constant companion and better half these last few years.

3 comments

Unit tests are useful for:

1) Cases where you have some sort of predefined specification that your code needs to conform to

2) Weird edge cases

3) Preventing reintroducing known bugs

In actual practice, about 99% of unit tests I see amount to "verifying that our code does what our code does" and are a useless waste of time and effort.

> In actual practice, about 99% of unit tests I see amount to "verifying that our code does what our code does" and are a useless waste of time and effort.

If you rephrase this as, "verifying that our code does what it did yesterday" these types of tests are useful. When I'm trying to add tests to previously untested code, this is usually how I start.

    1. Method outputs a big blob of JSON
    2. Write test to ensure that the output blob is always the same
    3. As you make changes, refine the test to be more focused and actionable
The problem with this for me is that most of the time "verifying that our ccode does what it did yesterday" is not a useful condition : if you make no change to code, its going to do what it did yesterday. If you do make a change to the code, then you are probably intending for it to do something different, so now you have to change the test accordingly. It usually just means you have to make the same change in 2 different spots for every piece of unit-tested code you want to change.
> If you do make a change to the code, then you are probably intending for it to do something different, so now you have to change the test accordingly. It usually just means you have to make the same change in 2 different spots for every piece of unit-tested code you want to change.

Sure, but that's how unit-tested code works in general.

> then you are probably intending for it to do something different

If you have decided that your software is going to do something different, you probably want to deprecate the legacy functionality to give the users some time to adapt, not change how things work from beneath them. If you eventually remove what is deprecated, the tests can be deleted along with it. There should be no need for them to change except maybe in extreme circumstances (e.g. a feature under test has a security vulnerability that necessitates a breaking change).

If you are testing internal implementation details, where things are likely to change often... Don't do that. It's not particularly useful. Test as if you are the user. That is what you want to be consistent and well documented.

Then think of the unit test as the safety interlock.
I had to migrate some ancient VB.NET code to .NET 6+ and C#. The code outputs a text file, and I needed to nake sure the new output matched the old output. I could have written some sort of test program that would have been roughly equal in length to what I was rewriting to verify that any change I made didn't affect the output, and to verify that the internal data was the same at each stage. Or... I could just output the internal state st various points and the final output to files and compare them directly. I chose the latter, and it saved me far more work than writing tests.

If I need to verify that my code works the same as it did yesterday, I can just compare the output of today's code to the output of yesterday's code.

I see two advantages in creating tests to check output

    1. You did the work to generate consistent output from the code as a whole, plus output intermediate steps. Writing those into a test lets future folks make use of the same tests.
    2. Having the tests in place prevents people from making changes that accidentally change the output
Don't get me wrong, tests that just compare two large blobs of output aren't fun to work with, but they _can_ be useful, and are an OK intermediate stage while you get proper unit tests written.
> In actual practice, about 99% of unit tests I see amount to "verifying that our code does what our code does"

That’s my experience too, especially for things like React components. I see a lot of unit tests that literally have almost the exact same code as the function they’re testing.

I've found that often find that a little bit of code that helps you observe that your code is working correctly is easier than checking that you code is working in the UI. The tests are a great place to store and easily run that code.
3) Preventing reintroducing known bugs

When I was learning unit testing, my mentor taught me this strategy when fixing production bugs. First, write the unit test to demonstrate the bug. Second, fix the bug.

That's what you get when you don't write the tests first.
That's just doubling your work. If you don't already have a spec, your unit tests and actual code are essentially the same code, just written twice.
Determining which states are authentically hazardous and mocking data and adjacent services to make those states accessible at the press of a button is definitely not the same as writing code which handles those states appropriately.
You should try switching it up. Write the tests and then ask the LLM to write the code that makes them pass. I find I'm more likely to learn something in this mode.
I'd argue having useable LLMs kind of brings out how problematic TDD is.

Imagine the dumbest function you have to write: a product A and a street address as input, and the shipping cost as an output.

How many test cases would you write to be absolutely sure that function actually does what you want it to do, and be confident it doesn't have weird exceptions that the LLM injected randomly ? I'd assume you'd still vet the code written by the LLM, but if it's hundreds of rambling lines doing weird stuff to get the right result, is it really faster than writing it yourself ?

If it's hundreds of rambling lines then I'm not going to be able to get it past my linter anyhow (complexity thresholds), nor am I going to be able to get it past my team when they review it. So yeah, that's a problematic case, but it's one I'm going to have to refactor to avoid with or without an LLM in the loop.
About the problems of TDD: Cedric Beust has a legendary blog post about it here: https://www.beust.com/weblog/the-pitfalls-of-test-driven-dev...
TDD works best if you default to testing at the outer shell of the app - e.g. translating a user story into steps executed by playwright against your web app and only TDDing lower layers once youve used those higher level tests to evolve a useful abstraction underneath the outer shell.

It seems to be taught in a fucked up way though where you imagine you want a car object and a banana object and you want to insert the banana into a car or some other kind of abstract nonsense.

How effective is the LLM when used this way, compared to normally?
I don't know what normally is, but I'd say it works pretty well.

Often the challenge is that the context for what you're trying to do is sprawling. There's just too many files and they're all too long: you end up exceeding the context window or filling it with 99% irrelevant stuff. Typically the structures you build for tests are smaller and more focused on the particular instance you're worried about, which I think is a better way to talk to an LLVM.

You don't have to explain, for instance, that there's data in production which doesn't match the schema in the code so it must be cautious to avoid running afoul of that difference. Instead you've mocked that data, so it's right there in the same code with the test that it's trying to make pass.

In reality, unit tests and integration tests are different names for the same thing. All attempts at post facto differentiation fall flat.

For example, the first result on Google states that a unit test calls one function, while an integration test may call a set of functions. But as soon as you have a function that has side effects, then it will be necessary to call other functions to observe the change in state. There is nothing communicated by calling this an integration test rather than a unit test. The intent of the test is identical.

No. Or maybe only if you also consider 'village' and 'city' to be the same thing.
That's a good example, because while they're clearly different things, any distinction you draw between them such as "population > 100k" or "has cathedral" is always going to be a bit arbitrary, and many cities grew organically from villages in an unplanned manner.
Is it? Kent Beck, coiner of unit test, made himself quite clear that a unit test is a test that is independent (i.e. doesn't cause other tests to fail). For all the ridiculous definitions I have come across, I have never once heard anyone call an integration test a test that is dependent (i.e. may cause other tests to fail). In reality, a unit test and an integration test are the same thing.

The post facto attempts at differentiation never make sense. For example, another comment here proposed that a unit test is that which is not dependent on externally mutable dependencies (e.g. the filesystem). But Beck has always been adamant that unit tests should use the "real thing" to the greatest extent possible, including using the filesystem if that's what your application does.

Now, if one test mutates the filesystem in a way that breaks another test, that would violate what Beck calls a unit test. This is probably the source of confusion in the above. Naturally, if you don't touch the file system there is no risk of conflicting with other tests also using the filesystem. But that really misses the point.

There are only two kinds of tests: ones you need and ones you don't. Splitting hairs over names of types of tests is only useful if you're trying to pad a resume.
Clusters of humans cohabiting a confined space? If you squint hard enough…
Implying that integration tests (or vice versa) are legally incorporated like cities, while unit tests are not? What value is there in recognizing a test as a legal entity? Does the, assuming US, legal system even allow incorporation of code? Frankly, I don't think your comparison works.
I think he is not implying a hard line legal standard but as connections and size increase different properties start to emerge humans start to differentiate things based on that, but there is a gradient so we can find examples that are hard to classify.
What differentiates a city from a village is legal status, not size. If size means population, there are cities with 400 inhabitants, villages with 30,000 inhabitants, and vice versa. It is not clear how this pertains to tests.

When unit test was coined, it referred to a test that is isolated from other tests. Integration tests are also isolated from other tests. There is no difference. Again, the post facto attempts to differentiate them all fall flat, pointing to things that have no relevance.

> What differentiates a city from a village is legal status, not size

Fine. And legal status depends on location. There are many localities.

You should not be downvoted as heavily as you are now.

I feel like we did testing a disservice by specifying the unit to be too granular. So in most systems you end up with hundreds of useless tests testing very specific parts of code in complete isolation.

In my opinion a unit should be a "full unit of functionality as observed by the user of the system". What most people call integration tests. Instead of testing N similar scenarios for M separate units of code, giving you NxM tests, write N integrations tests that will test those for all of your units of code, and will find bugs where those units, well, integrate.