Hacker News new | ask | show | jobs
by dorinlazar 1558 days ago
Like any TDD proponent (and most cultists, really), the author insists that „you're not doing it right”, despite reading the scriptures. You're always misunderstanding, YOU are the reason TDD doesn't work, no, TDD is itself flawless. There's always a slight misinterpretation of the magic words! But TDD works, you're the sinner for not using it right!

And then you have shocked people when they find out that 100% test coverage doesn't mean that you really have a bug-free codebase.

11 comments

And on the flip side: what if they're right and you are just doing it wrong?

I understand the point you're making. But, I find this retort to be every bit as frustrating as the rhetoric you're criticizing. It gives us permission to dismiss things we don't fully understand or that require experience and practice to master as snake oil.

Surely most of us would struggle to pick up general relativity or neurosurgery even after weeks or months of training. But we're not--I hope--going to dismiss our neurosurgeon instructor when they say we were cutting something incorrectly even though we REALLY THOUGHT we were doing it right this time.

Maybe TDD really is awesome. And maybe, simultaneously, it's not easy to write a 100% prescriptive guide for how to apply it to every kind of project.

I feel similarly about the rhetoric around OOP. Nine times out of ten, someone complains about OOP and cites some kind of Poodle-Dog-Animal class hierarchy. Then someone comes and says that program object relationships aren't really supposed to be taxonomical. Instead of taking that to heart and wondering if maybe they can try OOP again with a different mindset, the response is defensive. "Surely OOP is still terrible because I was taught the wrong way. And if it's possible to employ a technique ineffectively, then it must be a terrible technique. OOP sucks and they're in a cult so they can't admit it."

Do you have any idea how many wrong ways there are to use the controls in an automobile? Yet we still mostly blame the drivers when they cause a collision because they were using it wrong.

For what it's worth, I think the article is right about the evolution of the term "unit test" and the author mentions the "classicist" approach to unit testing, which is really the one that makes sense with TDD. The "mockist" approach to unit testing, is "simpler" because it just makes an individual class or function the "unit", but that will make tests brittle and make a test-driven approach much more verbose and cumbersome. It also so happens that the mockist approach is the default approach in today's programming languages, testing-frameworks, IDEs, etc.

I don't know, your examples feel a bit off. Surgery spends a ton of time on the techniques of surgery, sure. And yet we still had a proliferation of excess back surgery that was, in retrospect, unnecessary and we have taken effort to stop that. Similarly for driving, you are right that media often blames the drivers; but civil engineering looks at the roads that have more wrecks than others and looks to see why that is.

TDD suffers because it is a tool. As soon as it exists for its own sake, expect that it will be over used. This is more complicated because Tests are themselves a tool. So, having a tool that exists only for the sake of another tool, and it is not too far to see how this one is less clear cut.

I definitely didn't spend a lot of mental energy on my analogies, so I'm not going to defend them per se. But, your counter points actually make me more comfortable with the analogies- not less.

I'm not claiming that TDD is good and that it's only the practitioner's fault when things go wrong. Rather, my point is that it's hard to know like the civil engineering example you describe, and that we should be humble enough to acknowledge that "you're doing it wrong" might be a true statement despite how smart we believe ourselves to be.

In that, I agree. I don't know of many (any?) tools that are by nature bad. I just think there are many that are oversold.

That is, my point wasn't to say that you can't do it wrong. Rather, doing it right may not further the end goal. Just look up the article of someone trying to TDD a sudoku solver. It is painful, even though there really isn't any one thing that the person did wrong.

> But TDD works, you're the sinner for not using it right!

I often tell my colleagues that if technologies or methodologies are widely misunderstood, then practitioners aren’t to blame.

TDD might be great, but I have yet to see it widely succeed because its adoption is troublesome. It’s a bit like any of these newest products that promise to address everything consumers demand, only to fail miserably against the same old, leaving the few adopters asking themselves why commoners didn’t get it.

I would agree… except that MOST programming technologies and processes are widely misunderstood, and it feels increasingly so.

Get 10 random senior programmers in a room, and see if they agree on matters of design, code structure, testing, frameworks, system architecture. I’ve been part of summits trying to do this several times over the decades, and the only agreement we wound up with was that “Layered abstractions are sometimes a good idea”.

There is a lot of duplication of effort in languages, frameworks, and tools driven by “I can’t be bothered to learn this new idea, I know better”, and a lot of misunderstanding of even first principles.

Something like TDD (and other XP practices like pairing) reminds me of diet/fitness /nutrition regimens that are notoriously difficult to comply with. This isn’t to say they don’t work: athletes and bodybuilders do exist, as do many successful TDD practitioners. It’s just human nature to avoid things that require mental change, or to be prone to self deception in their application.

> Get 10 random senior programmers in a room, and see if they agree on matters of design, code structure, testing, frameworks, system architecture.

I see this often when the problem is abstract. When the problem is concrete, I rarely see this come up in practice.

I routinely have 4-5 senior engineers in a room agree on architecture, design and methodology for a single, specific, project. To me, these abstract disagreements are a symptom of a generalization mismatch.

> Something like TDD (and other XP practices like pairing) reminds me of diet/fitness /nutrition regimens that are notoriously difficult to comply with.

But these regimens do work. It's just that they don't work for the reasons most people believe.

Take fasting. Most benefits of fasting in casual practitioners come from the obvious fact that they aren't consuming as much calories as they did before, so they lose weight. That doesn't mean fasting works in a different way from, say, just eating less.

Same goes for TDD. It may be beneficial because it forces a culture of testing and code coverage, but the root of such benefits may not be writing tests before code, yet that's what most people agree TDD is about.

I thought this was exactly the point of TDD and fasting but maybe I'm going to the wrong church.
Another reason why it's so hard to actually say "tdd does/doesn't work" is that everyone has their own idea what it is.

Its like everything in software engineering: everyone has an opinion, few have actually tested theirs with performant and stable production deployments

I think, additionally, it's domain specific.

There are problems where TDD is actually a decent approach. If you have solid requirements beforehand, and your problem does reduce to small bite sized units with trivial state, then it's amazing for producing code that doesn't surprise you when you run it.

There are problems where it really really really isn't. Things where there is ambiguity in the expected outcome. Language processing for example is fairly difficult to tackle with TDD. There is no simple state, there is no unambiguous expected outcome. Whether "jump" is a verb or a noun non-trivially depends on the context, and it may even ambiguous. The correctness of such code is a percentage value, not a boolean.

For context, when I debug the keyword extraction for my search engine crawler, I'm looking at test output that looks like this: <https://memex.marginalia.nu/pics/frog-text.png> (blue are individual keywords, red are potential n-grams).

Isn't the acronym, test driven development?

All that means to me is that when we start a project we think about what it means for an implementation to be correct and write a test. If we agree on the specification, the test, then a passing implementation means that we have implemented it correctly -- for that particular test.

One of the challenges of using unit tests for this kind of testing is, as you say, difficult with modelling continuous values.

It is also difficult because the "proof" is so small and trivial. You need thousands of examples to gain some confidence... and for even moderately complex software this can not be enough.

Unfortunately, the path to writing good tests and verifying software requires a bit of mathematical vocabulary and reasoning which many developers are too shy to learn or outright hostile towards ("I've never needed no stinking math, this is programming!").

My point is: as long as you're driving the development of your software using testing, I think you're practising TDD.

Unfortunately, it's hard to be put into a position where you can apply one, let alone multiple new approaches to software development on a team developing a large application for a long time (say at least 2 years). You'll have team churn, you'll have requirements churn, you'll have understanding of methodologies and practical realities change, etc.

So by "testing their (opinions)", I usually understand people to mean that they've been on a project where they've enjoyed working on something in a particular way (people always underestimate what someone is simply enjoying doing, I have no idea why). I understand that almost nobody is able to compare and contrast two significantly different approaches, let alone establish things like incident rates or cost of improvement and such resulting from application of those approaches (because there's no reference point).

What we do need is for all developers to contribute to setting up a foundation dedicated to establishing software development methodology quality, and then have that set up a dozen teams of 4-8 engineers working on a dozen longish problems (6-36 months) with different methodologies, and then aggregate and analyze results in 12 years. All the engineers would have market-rate salaries, so we'd all have to be chipping in monthly :D

I mean, just imagine where we'd be if we've done that in 2010? (probably at the same place, but just maybe not!)

Or TMTOWTDI.

Or there's hidden assumptions that differentiate between a Google solution and a coffee shop solution.

Performant... stable... both have ridiculously variable definitions based on who you talk to. That alone is probably enough to get your 10 Sr. developers to provide different solutions.

I don’t think the assumptions are hidden. There is plenty of literature with a reasonable discussion of the range of definitions of performance and scale. It’s that people aren’t even looking for them. It’s about “Team Red vs Team Blue”. Or whatever buzzword needs to be on your resume alongside leetcode for the next interview. NoSQL vs SQL, FP vs OOP. Can’t get my raise if I’m building coffee shop solutions!

The counter to TMTOWTDI is a set of principles and discipline, at least within a team or a project. Not to outright reject the alternatives, but to recognize the power of consistent behaviour and prioritization of tradeoffs, to try to counter the brand allegiances.

> I don’t think the assumptions are hidden.

And yet in the very article we're talking about, there's no explanation of what assumptions the author is operating within.

I'll even go so far to say that most technical articles omit the assumptions - because it's boring content and is less likely to engage the readers.

Most senior engineers I talk to would give you "conditional answers" on those, and I wouldn't be surprised at all.
> Get 10 random senior programmers in a room, and see if they agree on matters of design, code structure, testing, frameworks, system architecture

Programming consist of different schools of thought, similar to philosophy or economics, thus conflicting view points are not necessarily wrong per se, but they collide with your school of thought. Therefore there will always be some disagreement even between the most experienced programmers.

Now there are of course still things that a right or wrong on a factual level within programming, but because different schools also labels things as right or wrong it can be hard to distinguish what is grounded philosophical reasons and what is based on factual reasons.

In my experience, there is little disagreement with how to solve a problem as with the problem being solved.

I've found that within the high performing engineers, they tend to agree quite often with the solution once they agree upon the priorities of features and the problems being prevented.

Development practices are not difficult to comply with because you make them work for you, not the other way around.

The issue that I see in software engineering is the XOR approach. Top-down vs bottom up, abstraction vs implementation. This is the wrong thinking. One of the classic books on software architecture said paraphrased "The architecture drives the implementation and the implementation drives the architecture". A classic book on software development said paraphrased "You design for the implementation you need and implement for the design you want.

It's a yin and yang. But many software engineers completely fail at this. They get stuck at implementation or design. You need both at the same time. They're trying to represent a many dimensional problem one or more dimensions lower than required. They're flat-worlders trying to describe a cube or even a tesseract. Of course there's going to be misunderstandings and mistakes. Everyone has a difference view of the cube. They're technically talking about the same thing, yet fail spectacularly when the parts don't quire align when they come together. The problem is very simple in a higher dimension.

Where are the lines drawn?

Take Agile. It clearly outlines that you cannot have managers getting in the way. Yet, I don't know how many companies I've seen trying "Agile" without giving up their managers.

If you're not even willing to move past the first step, indeed it is doomed to failure. But surely you should recognize that when you decide to go down that road?

Removing the managers from the equation requires a team of developers who have a very business-oriented mindset. I expect a lot of developers don't have that, and may not even be capable of it. For any given business, chances are that does not describe your team. I think it is fair to say that Agile is not realistic in the vast majority of cases.

Is the fault Agile for being written for a narrow scope of situations or for practitioners trying to adopt it where it was not designed to fit? I would suggest the latter.

> Is the fault Agile for being written for a narrow scope of situations or for practitioners trying to adopt it where it was not designed to fit? I would suggest the latter.

I believe Agile is one of the main offenders here. You suggest that Agile is not "realistic" in most cases, and I agree with that, but Agile consultants and practitioners like to say that Agile is about taking what works for you, and adapting it to your needs.

I think Agile adoption by people who believe in this, is doomed. When Agile is seen as a set of nebulous guidelines from where you can pick and choose, it just doesn't work, and in turn, these experiences feed the idea that Agile doesn't work as a whole.

Agile with managers

- We have rearranged your sprints because of super duper important deadline for a very important customer.

- But we already begun this sprint!

- Yeah, but we are Agile.

It's like everything in programming. The 1% of the elite can use almost any methodology with great success, but the other 99% constantly fail to perform.

The problem solving aspect of programming is hard for most people. Nearly all of the high performers that I know are on the high functioning autism spectrum. They spearhead the hard technical problems, and set the stage for everyone else to tag along wit the busy work.

It takes all kinds to make a team, but we still need to recognize everyone's strengths and weaknesses.

And not every problem in programming is a technical problem. There are more issues in communication and understanding the problem being solved. Doesn't do much good to have a technically correct product that doesn't quite solve the problem.

What software engineering needs is a methodology in management to identify and properly utilize everyone to their unique abilities.

Like everything in programming, the cultural part is more difficult than the technical part. I've seen more than one a fellow coworker comment all my tests. If you don't create the correct culture, and there isn't management buy in, it won't work.
I believe that in medicine, if a treatment fail because the patient doesn't follow it correctly, it is considered a failure of the treatment (of course better education can be part of the solution).
I worked with a guy who has since transitioned to teaching Agile methodologies, and one day he was talking about his mentor, 'Jay' and the methodology they used within Jay's consulting practice.

Most of the successful projects, Jay brought in a team of people he had worked with before. So at some point you have to ask if it's the methodology that makes the project successful, or the people who make the methodology work.

If you have ever been a Lead, or even a viable candidate for one, you've had meetings with others to conspire to be successful within the bounds of whatever rules management won't budge on. How to make this number say what we want it to say. How to make that graph go in the right direction. In these cases we are camouflaging our personal methodologies to pass for someone else's. We are succeeding in spite of them, because if we told them what we were doing they might make us stop.

So when they look back at the project, all they see is the things we let them see. The opinion of the person who introduced a change is never the one you can trust. You should ask the people in the trenches what they think.

How many of the people who misunderstand the technology or methodology actually learned it from some kind of official or primary source, though?

Compare that to how many of us learned "OOP" or "TDD" or "FP" or "DDD" from a few blog posts.

Is it physicists' fault that I can "learn" that quantum mechanics says CERN opened up a wormhole and alternate timelines from a crystal salesperson on Facebook?

There are a number of tech fads that were unknowingly widely promoted as a panacea whose applicability is limited to particular specific circumstances.

Microservices is another.

TDD gets pushed especially hard (I think) because when it works well it works REALLY well and because it can be quite literally addictive - the red/green being like sounds on a slot machine that generate a hit of dopamine.

Thats a recipe for some passionate promotion.

In fairness to the original author, his heterodox form of TDD does widen its applicability beyond its traditional scope of complex, stateless algorithmic code with simple APIs to the more common integration code that involves databases, etc. that predominates in most commercial code bases.

But, what tech innovation DIDN’T start as something with limited applicability? The only way to know is to discuss the benefits and tradeoffs in something approaching social science.

But humans struggle with rigor, it’s much easier to brand and market something, or to buy in to brands. So “ideas” like microservices become brands. And they’re misunderstood and misapplied because people don’t read the copious literature that discusses the tradeoffs and variations. And they don’t practice it as a discipline with someone that has mastered the technique successfully: they do it blind.

Same goes for TDD or other XP practices that are often deride as cultish. being a discipline is even harder to adopt than a design philosophy like microservices. Disciplines are about consistent behaviour. To an outsider, it’s freaky. But calling it cultish as some do is like saying Karate or another martial art is a cult. From the outside it kind of looks like it, but discipline or kata (practice of form) is known to be a success multiplier for the sustained successful application of practices.

If you don’t have a dojo or a sensei, could you teach yourself such a set of martial arts to mastery? If not, why do we expect everyone to pick up TDD after reading a book?

TDD gets pushed because it creates great, easily trackable metrics one can gesture to as evidence that a) your code is good and b) that you're doing a good job and should be paid more.

It makes developers happy because it translates the somewhat arcane nature of the work into something easily digestible by management, and is a fig leaf for shoddy work.

It makes management happy because it goes nicely on a chart that can be shown to the director/customer/shareholders, and looks good at status meetings. It also gives them something to poke at and micromanage.

It makes the customer/shareholders happy because it provides a metric that their money is being spent _doing something_.

TDD may have started as a coding best practice, but it exists and endures - and will continue to exist and endure - because it's performative, and the performance has value to every layer of a business, even though it has nothing to do with actually making the product better at this point.

The author has described what they thing would make the practice of TDD better. The rationale presented isn’t ridiculous. Reasonable people could disagree on it.
> you have shocked people when they find out that 100% test coverage doesn't mean that you really have a bug-free codebase.

This applies specially to business people.

100% test coverage just means you tested that s*t with all you expected...

...but your software does not live isolated.

Some external system may have a bug, or may inject some unforeseen values, or a solar flare might flip the value of a bit in your system and crash your software.

I see completely different problem. Just because there are branches in your code, doesn't mean there are branches in its inputs. And just because there are branches in the inputs, doesn't mean they are reflected in the code. So you may have 100% test coverage, but be testing branches which aren't going to be taken, at the same time completely missing the branches that are in the inputs, which you'll fail to handle. Example:

  fn abs(x: i32) -> u32 {
      x as u32
  }

  fn test_abs() {
      assert_eq!(5, abs(5));
  }
Boom, 100% test coverage! But the tests are actually very low quality. There is a bug when x is negative. That's why property-based testing is nice. It uncovers branches in the inputs (although the task of generating a representative set for the inputs is sometimes non-trivial).

After discovering the bug, one may even write a branchless implementation of this function for performance without updating the test, and it will still be 100% coverage. But the arithmetic has "logical branches" which do not look like ifs, instead they generate qualitatively different results for different inputs.

I tend to find logical/algorithmic bugs overemphasized. Property tests are nice and work very well for extremely complex algorithmic code that has simple inputs and outputs but most code I write at work simply isnt that.

In school I wrote parsers. At work I have done it but it's rare.

In fact if you write commercial code you'll often find that the code that you rely on that is like that weirdly gets concentrated in open source libraries which you will be testing only indirectly.

Certainly when writing commercial code I find that the majority of bugs lurk in the interstitial spaces between subsystems I am integrating or in misunderstandings about how the overall system or subparts of it are supposed to behave.

And property tests are not much help there and unit tests are often a hindrance (because theyre as likely to bake in wrong assumptions).

TDD is some help with this but only if A) it's paired with BDD to exorcize specification bugs and B) done with integration tests that exercise all parts of the system together.

The sort of programs you are writing about can easily exhibit the sort of problem that yakubin is giving an example of - here's a comparable example:

  wasBusinessDay(date): Date.dayOfWeek(date) not in (Date.Saturday, Date.Sunday)
Which is as trivial as yakubin's example, but can fail for more, and much more nuanced, reasons - and no, you are not, in general, going to find an off-the-shelf answer to the question of whether your particular business was operating on a given day.

Both your claim that a big problem is things not working as they are supposed to[1], and the fact that the same old basic security problems get repeated in every new platform that comes along, show that libraries have not, in practice, been any sort of panacea for business software.

[1] I suspect you are referring to application code rather than library code, but either will do for the point I am making here.

I think property tests work well on larger things, with a super basic assertion they're essentially just fuzzers. You can point hypothesis at a swagger spec and let it test your API like this.

I built a property testing library back in the day when I made a library for creating UIs. I was then able to write tests like

* Given an arbitrary UI

* And an arbitrary up/down/left/right list of user direction (this was the only way of navigating)

* If they press a direction and the focus moves, pressing the opposite direction takes them back to where they were before

This uncovered bugs in interfaces like this with 3 items, bottom left is in focus and you press right. Now press left, users probably want to go back to the bottom left rather than top left.

    ┌─────┐  ┌─────┐      ┌─────┐ ┌──────┐
    │     │  │     │      │     │ │      │
    │     │  │     │      │     │ │      │
    └─────┘  │     │      └─────┘ │      │
             │     │ ────►        │focus │
    ┌─────┐  │     │      ┌─────┐ │      │
    │focus│  │     │      │     │ │      │
    │     │  │     │      │     │ │      │
    └─────┘  └─────┘      └─────┘ └──────┘


Another one was

* Given an arbitrary list of API calls to add/remove/change the UI and user direction presses

* There is always only one item in focus, or no items at all

This actually uncovered a specification bug. We had two requirements

1. Always have an item in focus if there are any that can be in focus

2. If you delete all the items, then add one back in, it's not focussed

Those conflict, but we never noticed, and even had passing unit tests for both cases.

I think property tests can map very nicely to the level of system description that we typically want. I'd love to see larger integration with BDD tools.

Maybe it's true for CRUD apps. I was speaking from the experience of writing compilers. From what I've read, logical/algorithmic bugs are also common in gamedev. I'm most interested in algorithmic code though, so that may be a bias.
Logical and algorithmic code (& bugs) were a lot more common in gamedev before the onset of the open source game/physics engines. It has followed a similar evolution to the rest of software development whereby core algorithmic components tend to get shared in generic, battle hardened OSS systems like Unity. Most games will rely on their APIs rather than implement their own physics or rendering engines and this is only becoming more common over time.

College students who start out by writing toy compilers, quicksort implementations and physics engines will often get a distorted view of what professional development is actually like.

Someone has to write those libraries you're using, and they need to test it. Maybe you're satisfied with having a job consisting of gluing libraries together, but not everyone is like that and there is no need for being patronizing towards them.
abs on machine-sized integers has another easy-to-make mistake: what's the absolute value of the signed byte -128?

You face some hard decisions on that one. None of the error handling paradigms are good at handling the class of numeric errors that people really, really want to pretend don't exist because no matter what kind of error they throw they're a major pain to deal with correctly.

I'd agree, if the result type was the same as the input type. However, notice that the function I wrote returns an unsigned type, so in the case of byte-sized types it would be fn abs(x: i8) -> u8, i.e. there is enough space for 128. Otherwise, good point.
100% test coverage only covers one dimension of the code, the lines. The other dimension is the data range, and proper coverage is generally not measured.

So 100% test coverage you still have tons of possible codepaths that can go sideways, way before including external variability.

My best testing practice is writing tests as needed based on instinct. YES. I can feel it when a function needs test or not. It will hint by giving a little anxiety on expected result of that function. And I don't care 100% coverage anymore.
But this approach doesn't scale. How can you expect everyone to have this intuition? Agreed that 100% coverage is stupid.
So currently there is no any approach that scale at all. Everyone will write test differently.

And that's why we have code review process, and senior software engineers?

---

EDIT: add an example

When you look at a body of function and think if someone else changes some codes and you will have a hard time figuring out what goes wrong, that, the time you need to add some tests to it. Often it's a function that took numbers of iteration to work as you expect, or a function that is not simple but look so easy you write it right in one go.

I’ve never seen an approach that scales. Once a team passes a certain size, it all turns into a disorganized shit show. Communication is hard, and as the pressure to earn revenue intensifies, there’s less & less time or buy in for craftsmanship.
Yep, true. I'm really familiar with the vibe of time-to-market vs not to fuck customers up. Modules that are directly related to values to customers get more tests.
Why is 100% coverage stupid? I agree that there could be configuration or data classes that does not make sense to test, so these could be ignored by the coverage tool. Is it still stupid to aim for 100% coverage in the rest of the relevant not-ignored code?
100% coverage as a measure of success is stupid.

And (this gets into personal opinions) sacrificing depth on tests on critical codepaths in order to spend time getting to 100% coverage is also stupid.

Time is finite. Money is finite. 100% coverage provides nothing but false assurances and pretty green metric lights.

100% coverage measured by lines of code being hit is actually insufficient for proper test coverage (which is what most, if not all tools that measure code coverage do).

It's easy to get "100% code coverage" for the below function:

  def nths(number: int, divisor: int) -> float:
      return number/divisor

  def test_nths():
      assert nths(1, 1) == 1
Yet it's obvious that test is not really taking care of anything.

What you need is 100% semantic coverage, i.e. your tests should cover all potential outcomes of the code at appropriate levels.

This is an area where suitable extracurricular activities can give more perspective on things.

In many disciplines there are activities that end up feeling like 'scaffolding', learning exercises that you don't necessarily do all the time. They make you a better practitioner the first time you do them, continue to add value for some time, and then are useful from time to time as a refresher.

I think if you've never done TDD, you're missing out. If you had to be dragged kicking and screaming into it, you're missing out (possibly have damaged yourself in the process) You don't need to make a cult out of it (and I suspect some of the resistance to it is from people who feel like they're being asked to join one), anymore than you have to religiously clean your shoes to see value from having a set of cleaning tools.

The same for pair programming. It makes you look at some classes of problem in an new way, and those lessons can stick with you even when you are working alone on a personal project.

Most of the time these days I tend to do a testing sandwich. When I'm still trying to figure out what the APIs will allow I'm free-form writing code trying to stick the concepts together. At some point I get stuck trying to juggle corner cases, and I realize that I'm grinding gears and need to write test code for a while. At this point I still get a lot of the ergonomics benefits of TDD because I'm not yet wedded to my implementation (low sunk cost) - because it doesn't work anyway.

Outside Smalltalk, TDD has never "worked" for me. But in a Smalltalk environment it was an absolute no-brainer and super slick because the environment was set up for it to just work. The nuance is the code-build-test cycle. In Smalltalk, they were all one and the same. In other languages and with other VMs, that cycle is disconnected. Likewise, once we moved to web applications, things changed. Now the unit we want to test is the API, or the widget, not really the function per se, but every TDD tradition has been at a much lower level. It overlooks that it takes real time and energy to build out tests at that higher level, and there are better tools than what have traditionally been "TDD."
But being 100% bug free is not the goal of TDD?
We are in the year 2022. We don't have the technology to produce bug free software based solutions.

100% code coverage, as other mentioned in this thread, does not guarantee you exercised 100% of all the inputs values possible.

That is why:

- You can pay millions of dollars to a Software Company, and when you install the product the usual license text, says something in capitals along the lines of: NO GUARANTEES WHATSOEVER FOR ANYTHING... plus some will mention you can't use for controlling X-Ray machines, Nuclear Power generators and so.

and also why

- Software where life depends on, normally will use a type of summation or consensus based software like the Shuttle had with 3/4 computers.

I don't think it's entirely technological limitation. More often than not, the discovery of bugs is increased understanding of what we require from the code. We often say a code is buggy when it surprises us in some fashion, then we tacitly invent a new requirement we say it has violated.

If you have a function

  F(signed int32 n) -> (signed int32 A, signed int32 B) 
that returns two integers (A, B) so that

  A*B = n
and we discover that F(2) -> (-2, 2147483647), which is entirely correct in a language that permits integer overflow; then we call it a bug because A and B must be smaller than n (or whatever). This was not a requirement until the bug was discovered.
I’m not really sure if this was understandable, but I’m basically in the same camp as you are.

It’s next to impossible to have 100% bug free code. Automatic testing helps and TDD helps to write tests, but it’s not the goal. The goal is to create code that is well tested and easy to change.

> We don't have the technology to produce bug free software based solutions.

Not correct. See seL4, CompCert etc.

I talked about Software based solutions that is also why I referred to summation/consensus based systems like the Shuttle had. I even believe they used different programming languages, since of course compilers can also have bugs.

To use an example like seL4, is formally proven correct against its specification. But how we got the specification itself correct?

"Hidden Fallacies In Formally Verified Systems" https://smartech.gatech.edu/bitstream/handle/1853/62855/BOBE...

How seL4 or CompCert protect me against against hardware issues like ECC errors, or the Pentium FDIV ? [1]

To humbly quote Knuth:

"Beware of bugs in the above code; I have only proved it correct, not tried it."

[1] https://en.wikipedia.org/wiki/Pentium_FDIV_bug

It is a lot easier to get the spec right than getting the implementation right.

> How seL4 or CompCert protect me against against hardware issues like ECC errors, or the Pentium FDIV ?

That's a straw man argument. Nobody claims that formal proofs can protect you from hardware failures. However formal proofs can protect you against Pentium FDIV style bugs (see ARM formal specification efforts). Intel started massively investing in formal proof methods because of the not-formally-proven-correct Pentium FDIV fiasco.

Thanks for replying, but I insisted in my previous post that I referred to Systems and Solutions.

In real world use, and other than in academic contexts, is of reduced usefulness that I claim my component used TDD and was formally verified, while throwing over the wall, the responsibility of delivery for a system that can't fail, to somebody else.

So my claim is, we don't know how to do it, we can formally verify some components, and we hope the spec is correct.

Forgetting the spec only has the known known's and unknown known's, but not the unknown unknowns, is a kind of intellectual hubris I am not willing to bet lives on. Certainly not under the disguise of using formal methods.

So I am referring to examples like this:

"A380 Flight Controls overview" [1]

• 3 PRIMary Flight Control and Guidance Computers integration of Auto Flight (ex FGEC) and Flight Control (ex FCPC)

• 3 Auto-Pilots

• 3 SECondary Flight Control Computers dissimilar Software and Hardware, simpler Control Laws

[1] https://www.fzt.haw-hamburg.de/pers/Scholz/dglr/hh/text_2007...

I think people try too hard to make it work and end up trying to apply these strict adherences. Same thing with agile and pair programming.
There's a hierarchy of smartness: good enough to write perfect code and need no tests, good enough to catch problems with tests, good enough to believe that better techniques would have allowed catching bugs that slipped through tests.

Any technique, in retrospect, could have been applied better, in an improved form.

No body is saying it does not work per se. But how many test suite have you seen which are less helpful?
Quite a few people are saying it doesn’t work. I have worked on TDD codebases where anytime you changed a function or tried to refactor something trivial it took 3x the amount of time to fix the tests even though the code worked correctly as written. This is due to mocking or naming or expecting functions to never change. TDD proponents would say that is the “wrong” way to do it, which Im sure it is but that doesn’t stop it from happening and killing productivity.
This matches with what the articles says: > "You change a little thing in your code and the only thing the tests suite tells you is that you will be busy the rest of the day fixing false positives."
3x is optimistic. Tests can "suffocate" a codebase to the point where no one dares to make any cross-cutting changes anymore, at all.
particularly since TDD organizations often maintain the constraint that the tests _should never change_ since they somehow embody the requirements. personally I'm not settled on an architecture until I'm part of the way through..I can't really understand how its all going to fit together until I'm in the process of building it.
...and the 3x amount of time isn't even the biggest downside when that happens, that honor goes to how much the value of those tests erodes while they are being adopted.
I have encountered this too, just as you and the author did. However, when I switched to integration testing behaviors rather than unit testing low level functions as the author suggests it went away.

Traditional TDD proponents would say that what we're doing is definitely not orthodox TDD but IMHO it's the only way that actually works.

> But how many test suite have you seen which are less helpful?

I got ya. I have worked on software with a buggered mix of mocked and integration tests that fails about one time in four. But since the bug was in some asynchronous code and would appear in random tests, with random errors (even on thoroughly mocked tests), we can't easily smash it. It's one of those heisenbugs where enabling logging would cause all the tests to pass consistently.

And to put icing this cake, the software worked - this bug wouldn't appear in the production deploy.

So, yeah. The test suite was less than helpful. It certainly couldn't prove any qualities about the software we were writing.