Hacker News new | ask | show | jobs
by cloogshicer 351 days ago
I think what people really mean when they say "This can't be tested" is:

"The cost of writing these tests outweighs the benefit", which often is a valid argument, especially if you have to do major refactors that make the system overall more difficult to understand.

I do not agree with test zealots that argue that a more testable system is always also easier to understand, my experience has been the opposite.

Of course there are cases where this is still worth the trade-off, but it requires careful consideration.

4 comments

This is the case.

I did a lot of work on hardware drivers and control software, and true testing would often require designing a mock that could cost a million, easy.

I've had issues, with "easy mocks"[0].

A good testing mock needs to be of at least the same Quality level as a shipping device.

[0] https://littlegreenviper.com/concrete-galoshes/#story_time

I've had a lot of success writing driver test cases against the hardware's RTL running in a simulation environment like verilator. Quick to setup and very accurate, the only downside is the time it takes to run.

And if you want to spend the time to write a faster "expensive mock" in software, you can run your tests in a "side-by-side" environment to fix any differences (including timing) between the implementations.

It's cool to learn about Verilator: I've been proposing our HW teams give us the simulations based on their HW design for us to target with SW, but I am so out of the loop on HW development, that I can't push them in this direction (because they'll just give me: "that's interesting, but it's hard", which frustrates me to no end).

Can you perhaps do a write-up of what you've done and how "slow" it was, and if you've got ideas to make it faster?

The hardest part is toolchains, for two reasons. First, Verilator doesn't have complete SV language support, although it's gotten better. Second, hardware has a tendency to accumulate some of the most contorted build systems I've ever seen and most hardware engineers don't actually know how to extricate it.

Once it's actually successfully run through Verilator, it's a C++ interface. Very easy to integrate if your sim already has a notion of "clock tick."

I like to put it on its head: a proper fake for any component should be designed by the authors of the component: they can provide one to the same behaviour relatively cheaply.

With hardware, I try to ask for simulated HW based on the designs, but I usually don't get ever it.

If I'm working next to the hardware group, I generally write my own. this allows me to make progress on drivers/firmware before hardware is available. if its an asic we can even spend a little time making it run in the DV environment - they get vectors for free and overall we get more confidence that the firmware/driver is going to work on delivered silicon.

if something doesn't work on actual hardware, now we're in a really good place to have to a conversation. clearly the simulator differs from the actual design, and we can just focus on sussing that out. otherwise the conversation can be alot more difficult and can devolve into 'hardware's broken' vs 'software person doesn't have a clue'.

> A good testing mock needs to be of at least the same Quality level as a shipping device.

unfortunately I disagree - it needs to be the same quality as the device. If youe mock is reliable but your device isn't, you have a problem.

Good point.

That was actually sort of the problem I had, in my story.

Sounds like you’re responding to the title without listening to the presentation. He literally says this in the intro.
It's often shorthand for "this cant be unit tested" or "this isnt dependency injected" even though integration tests are perfectly capable of testing non-DI code.

The author's claims that we should isolate code under test better and rely more on snapshot testing are spot on.

> rely more on snapshot testing are spot on

Never quite liked "snapshot testing" which I think has a better name under "golden master testing" or similar anyways.

Reason for the dislike, is that it's basically a codified "Trust me bro, it's correct" without actually making clear what you are asserting with that test. I haven't found any team that used snapshot testing and didn't also need to change the snapshots for every little change, which obviously defeats the purpose.

The only things snapshot testing seems to be good for, is when you've written something and you know it'll never change again, for any reason. Beyond that, unit tests and functional/integration tests are much easier to structure in a way so you don't waste so much time reviewing changes.

The purpose of snapshot testing is to not have observable changes if you think there shouldn't be observable changes. To that end, a pattern I like is:

- Don't store/commit the snapshot and have an "update" command. Your CI/CD should run both versions of the software and diff them. That eliminates a lot of the toil.

- You should have a completely trivial way to mark that a given PR intends to have observable changes. That could be a tag on GitHub, a square-bracket thing in a commit message, etc. Details don't matter a ton. The point is that the test just catches if things have changed, and a person still needs to determine if that change is appropriate, but that happens often enough that you should make that process easy.

- Culturally, you should split out PRs which change golden-affecting behavior from those which don't. A bundle of bug fixes, style changes, and a couple new features is not a good thing to commit to the repo as a whole.

The net effect is that:

1. Performance improvements, unrelated features, etc are tested exactly as you expect. If your perf enhancement changes behavior, that was likely wrong, and the test caught it. If it doesn't, the test gives you confidence that it really doesn't.

2. Legitimate changes to the golden behavior are easy to institute. Just toggle a flag somewhere and say that you do intend for there to be a new button or new struct field or whatever you're testing.

3. You have a historical record of the commits which actually changed that behavior, and because of the cultural shift I proposed they're all small changes. Bisecting or otherwise diagnosing tricky prod bugs becomes trivial.

>I haven't found any team that used snapshot testing and didn't also need to change the snapshots for every little change, which obviously defeats the purpose

I dont see how this even defeats the point, let alone obviously.

If a UI changes I appreciate being notified. If a REST API response changes I like to see a diff.

If somebody changes some CSS and it changes 50 snapshots, it isnt a huge burden to approve them all and sometimes it highlights a bug.

> I dont see how this even defeats the point, let alone obviously.

You generally don't want to have to change all the tests for every change, particularly implementation details. Usually when people do snapshot testing of for example UI components, they serialize the entire component then assert the full component is the same as the snapshot, so any change requires the snapshot to be updated.

> If somebody changes some CSS and it changes 50 snapshots, it isnt a huge burden to approve them all and sometimes it highlights a bug.

Lets say person A initially created all these snapshots, person B did a change that shows 50 snapshots changed, who's responsibility is it to make sure the snapshots are correct? Person A doesn't have the context of the change, so less ideal. Person B doesn't know the initial conditions Person A had in mind, so also less ideal.

When you have unit tests and functional tests you can read through the test and know what the person who wrote it wanted to test. With snapshots, you only know that "This was good", sometimes only with the context of the name of the test itself, but no assertions you can read and say "Ah, X still shows Y so all good".

>You generally don't want to have to change all the tests for every change

You generally don't want every change to result in a lot of work. If changing a lot of tests means looking at a table of 30 images and diffs, scanning for problems and clicking an "approve button", that isn't a lot of work though.

>Lets say person A initially created all these snapshots, person B did a change that shows 50 snapshots changed, who's responsibility is it to make sure the snapshots are correct?

The person who made the change.

>Person B doesn't know the initial conditions Person A had in mind, so also less ideal.

Yes they will because the initial conditions also had a snapshot attached. If your snapshot testing is even mildly fancy it will come with a diff too.

>When you have unit tests and functional tests you can read through the test and know what the person who wrote it wanted to test. With snapshots, you only know that "This was good",

If you made a change and you can see the previous snapshot, current snapshot and a diff and you never know if the change was ok then you probably shouldn't be working on the project in the first place.

And no, the same isn't necessarily true of unit or functional tests - I've seen hundreds of unit tests that assert things about objects and properties which are tangentially related to the end user and come with zero context attached and I have to try and figure out wtf the test writer meant by "assert xyz_obj.transitional is None". With a user facing snapshot it's obvious.

No regular developer will carefully review 50 changed snapshots. They'll stop doing a proper job of it after the third or fourth that looks like the same trivial unimportant change and miss the bug found by snapshot 37.

I do agree that a lot of people write bad tests meaning the test name does not properly describe what the test is supposed to be about so I can check the test implementation and assertions against intent. They also like you say assert on superfluous things.

The problem with snapshots is that it's doing the exact same thing. It asserts on lots of completely unimportant stuff. Unlike proper unit tests however I can't make it better. In a unit test I can make an effort to educate my peers and personally do a good job of only asserting relevant things and writing individual tests where the test name explains why "transitional has to be None".

Snapshots are a blunt no-effort tool for the lazy dev that then later requires constant vigilence and overcoming things humans are bad at (like carefully checking 50 snapshots) by many different humans VS a unit test that I can make good and easy to comprehend and check by putting in effort once when I write it. A good one will also be easy to adjust when needed if it comes time to actually need to change an assertion.

> scanning for problems and clicking an "approve button", that isn't a lot of work though.

But you're actually mentally listing requirements for each one of those snapshots you check, which hopefully is the same as the previous person who run it, but who knows?

> Yes they will because the initial conditions also had a snapshot attached. If your snapshot testing is even mildly fancy it will come with a diff too.

Maybe I didn't explain properly. Say I create a component, and use snapshot testing to verify that "This is how it should look". Now next person changes something that makes that snapshot "old", and the person needs to look at the diff and new component, and say "Yeah, this is now how it should look". But there is a lot of things that are implicitly correct in that situation, instead of explicitly correct. How can we be sure the next person is mentally checking the same requirements as I did?

> If you made a change and you can see the previous snapshot, current snapshot and a diff and you never know if the change was ok then you probably shouldn't be working on the project in the first place.

It seems to work fine for very small and obvious things, but for people make changes that affect a larger part of the codebase (which happens from time to time if you're multiple people working on a big codebase), it's hard to needing to implicitly understand what's correct everywhere. That's why unit/functional tests are so helpful, they're telling us what results we should expect, explicitly.

> I've seen hundreds of unit tests that [...] With a user facing snapshot it's obvious.

I agree that people generally don't treat test code with as much thought as other "production" code, which is a shame I suppose. I guess we need to compare "well done snapshot testing" with "well done unit/functional testing" for it to be a fair comparison.

For that last part, I guess we're just gonna have to agree to disagree, most snapshot test cases I come across aren't obvious at all.

Are you really going to be reviewing all those 50 snapshots carefully?

The bound of testing on the "other side" is to test just enough not to increase the maintenance burden too much.

In the talk, he also talks about passing a flag which would actually update the snapshots/golden files if needed
Testing is a skill. The more you do it, the less expensive it becomes.
The main cost isn't writing the tests themselves but the increased overall system complexity. And that never goes down.
> but the increased overall system complexity

I think this happens because people don't treat the testing code as "production code" but something else. You can have senior engineers spending days on building the perfect architecture/design, but when it comes to testing, they behave like a junior and just writes whatever comes to mind first, and never refactor things like they would "production code", so it grows and grows and grows.

If people could spend some brain-power on how to structure things and what to test, you'd see the cost of the overall complexity go way down.