Hacker News new | ask | show | jobs
by samatman 855 days ago
The only issue with SemVer is that it's a social contract. There's an available solution to this: make it a technical contract instead.

Most languages these days have a built-in test suite. They can define "no breaking changes" so that it actually means something. Have a set of tests called API. During a major release cycle, you can add tests, but you can't change the tests you have, and the tests have to keep passing. The package registry can run those tests, and if any fail, you don't get to post a minor version release with that code.

This goes from an underdefined "our API will have no breaking changes" to "this is the guaranteed behavior of the API, and cannot change until the major version number is bumped". If a downstream user of the package sees some behavior they want added to the API contract, they can write a test and submit it as a PR, and that test can go into the next release if the maintainers agree that it's a stable behavior which they don't intend to change.

When you move from e.g. 1.0 to 2.0, the tests which now fail are moved to "1.0 API", but they're never removed. No test which is ever in an "API" testset can ever be removed, the package manager enforces this. Provide some mechanism so users of the package can annotate API tests in packages they use as a part of their own test suite, so that when they upgrade, those test failing is an immediate message about what no longer works. If you only rely on behavior which is in common from 1.0 to 2.0, it should be safe to upgrade.

No more taking people's word for it when they say "no breaking changes", no more bikeshedding about what is or isn't a breaking change, just... tests. End of.

4 comments

I describe this idea of an executable feature spec in my roadmap blog post from earlier this year. I agree it’s a great way to think about it.

https://fireproof.storage/posts/roadmap-to-1.0/

We’d define 1.0 in exactly the way you describe, where we can add tests for 1.1 but not remove them without triggering 2.0

I fiddled around a little with the idea of test-driven versioning a while back. Maybe you'd find it interesting. https://github.com/abathur/tdver

I did draft a git-based implementation (https://github.com/abathur/tdverpy), but it just obviously can't be as compelling as one that was part of a language's native tooling/ecosystem could be.

This is quite similar to what I have in mind, yes. Great minds think alike!

I do think that having an API subset of tests is better than basing the system on all tests. Packages should have as many tests as possible, I frequently write tests which I know will break when I do further work on the code, so that I notice when it happens, and because if it happens accidentally it's probably a bug. Wouldn't want a versioning system to have a side effect of making people reluctant to write a test, because it would commit them to the results. I envision tests migrating from the rest of the suite to the API set over time.

I do like that your system completely specifies the meaning of minor and patch numbers, and wonder if there's a way to tweak my proposal so that it does so as well.

I wouldn't try to insist anything I sketched out is the ~right approach. I was just trying to imagine one path through the possibility space, and then reason a bit about what kinds of information it might be able to convey.

There are almost certainly better paths through, and I suspect the idea isn't broadly usable without different kinds of tests that have different kinds of rules. It's probably not helpful to have to increment your major version just because you use snapshot testing and some dependency update causes a trivial shift in the output.

I also fiddled around a little with an idea I called "earmarks", which are basically just version-bound tests. You could use these to express the idea that, say, test_x shouldn't pass or fail until the version is >= a.b.c.

This would make it easy to deprecate an API today and go ahead and ship a test that requires the API to be present and functioning until the next major release but not after. Or, for example, to make a commitment device that asserts the project will hit some doc/lint/typing goals by some clear point.

Since it's an open-ended mechanism, I imagine something like it is the lower-friction way for a real project to explore applying these concepts without full toolchain support.

It's a good idea, but I think it still relies on people taking the effort to be responsible, which just doesn't seem to work long term...
That's always a risk! One of the strengths of the proposal is that if a maintainer slacks on defining a solid API testset, users can submit the tests they think belong. At that point the responsibility is baked in: once a test is added, you either keep it green or bump the version, enforced by the registry.

If a maintainer staunchly refuses to define an API, that's useful information, the kind you can't get with standard SemVer, where the only mechanism is trusting strangers to do the right thing. Which, to be fair, works ok, some of the time.

> you either keep it green or bump the version, enforced by the registry.

assert(true); is a thing. I don't think this solution would actually work. Tests might be refactored or improved and that shouldn't trigger a major release.

Malicious compliance is in fact a useful escape hatch here, a maintainer can release a 1.0 where the entire API is "test 1 + 1 == 2". That, too, is useful information.

But the package registry checks all API tests against the last version and rejects the registration if they change at all. That can be relaxed for non-semantic parts of the test, like a description of what the test means, but none of the code is allowed to change. It would be better if this were based on the AST, so that whitespace tweaks don't trigger a build failure, that's practical to achieve in most languages.

Refactoring an API test isn't worth losing the guarantees a system like this provides, and it's only the API tests which come with any restrictions, maintainers may do as they please with the rest of the test suite. An improved API test has to be provided as a new test. Part of the proposal is that users can refer to API tests in their own code, as a way to determine if tests they rely on break in a major release, so the tests need unique names, which means they can be rearranged in the file. It also means that if there's a typo in the name, or the name sucks, well, you're stuck with that until the next major release, and even then it goes on to live in infamy, forever, in the obsolete-API portion of the test suite. Not ideal, but it can't be avoided.

hmm. Interesting. So, most likely, "stable" software will likely release with a major version somewhere in the hundreds instead of 1.0? Since initial development usually means lots of breaking changes while details are discovered/built, I can't see 1.0 having any useful meaning.
It's not all tests, it's just the API tests. I'm not sure why that was unclear to so many people. You can have hundreds of tests, thousands even, only the API tests are special.

If there's no stable behavior because the software is still at that stage of development, it's 0.x software still. That's true in SemVer as well as this refinement of it.

Contrariwise, if you think software is ready for 1.0 and you can't come up with any tests which display guaranteed behavior which won't change without that major version bump, then no, it's not ready.

I don't know why this got downvoted; it's at the very least an interesting proposal. Would love to hear a critique arguing that it's a terrible idea.

Existing package managers could even implement it in a completely backwards- compatible way: if you as a package maintainer don't care for it, you simply never add "API tests".

Its an idealistic view which will almost certainly fail. Test suites, as much as we like to hope that they reflect real usage, mostly dont. A simple example: if function a gets changed from o(n) to o(n^2) but otherwise behaves identically, most test suites will still pass, but if a user has that function in its own inner loop you can go from o(n^2) to o(n^2^2) which can definitely break a lot of things (simple example: transaction was holding lock for too long and so the transaction was aborted). Being able to catch the above is a high bar for a test suite which I'm fairly confident most test suites are way below that.
You're allowed to fix bugs introduced by new releases, you know. It's called a patch release?

Is your claim that a release which introduces bad algorithmic complexity requires a major release to fix in semver? Who thinks this?

Right. Theres a difference between, “oh no, sorry, WE should fix that in the next patch” vs, “oh no, sorry, YOU should accommodate this incompatibility with a change on your side.”
I would not call it terrible, but I got a “silver bullet” vibe, which it is definitely not.

1. For a library there is API and there are implementation details. What if test depended on implementation detail?

2. What if tests had undisputable bug?

3. Test refactoring requires major release now?

4. Realistically test suite will have some execution paths not covered.

I like the idea of running same tests over multiple versions, to observe changes. But I disagree that it would automate semver. (Maybe in very limited subset of cases)

P.S. Not an actual downvoter, but if I would have downvoted, these would have been the reasons.

> For a library there is API and there are implementation details. What if test depended on implementation detail?

If the test is in the API testset, it's API. If it isn't, it's an implementation detail.

> What if tests had undisputable bug?

If it's in the API testset, time for a major version bump. If not, fix it.

> Test refactoring requires major release now?

Only if you're refactoring the API, as defined by the API testset, thereby producing a breaking change.

> Realistically test suite will have some execution paths not covered.

Doesn't matter. If the behavior isn't in the API testset, it's not a part of the API.

> But I disagree that it would automate semver.

The point isn't to automate semver, I'm not even sure what that would mean. It's to define it, in a useful and objective way.

The point of the criticism is that defining the version and what constitutes a breaking change like this will still leave people with unexpected breakage in the real world. What you've said so far has not really addressed that point straight on, which might be the reason for the comments and downvotes, I presume.

I don't think your proposed scheme needs to be perfect in that regard, but acknowledging the concern and at least putting it in perspective would probably help.

I've no idea what downvotes you're referring to, I'm well into the black on that post. ¯\_(ツ)_/¯

SemVer is just a pinky-swear not to break people's code. In the real world, people's code breaks anyway, and then you get an argument about what's API, and expected behavior, and so on, and so forth.

What I'm proposing is simply to replace the pinky promise with tests. From some of the other comments, I think this point may have been missed: it isn't every test in your test suite, it's the ones marked "API", only.

This is a strict improvement over social-contract SemVer in two ways: one is that the package manager won't let the maintainers break the API tests without a major version bump. The other is that, if you, as a user, are unsure if some behavior is part of the stable API, you can write and submit a test to that package. If that test is accepted, great: that behavior now cannot change without a major version bump, because, again, the package manager will not bundle the package if that test breaks. Furthermore, even on a major version bump, it is instantly clear if that test is still valid, or not, you can just check before upgrading. If they don't accept the PR, you know that it isn't considered part of the API, so you add the test to your own test suite, so that at least you know quickly what broke if they change it.

> I've no idea what downvotes you're referring to, I'm well into the black on that post.

As you should be, it's a great contribution. I was referring to the downvotes mentioned in a comment further up.

I agree that what you propose is an improvement, but it can be misunderstood to claim that it can prevent _any_ real-world breakage. There will always be aspects not covered by tests and which other people still rely on.

I've had this experience with API contract tests between systems. Despite covering a lot of details and preventing deployments that failed these tests, we would occasionally run into problems where passing changes would break stuff in production. There was always an area of uncodified assumptions, and for a case of tens of different clients, whereas public libraries can have millions. So, I believe this is also applicable to your proposed solution.

You can argue that your solution significantly shrinks this area of uncertainty while also _defining_ it, which helps when reasoning about what you can depend on - and I agree. But it does not eliminate the gap, and this is what people were pointing out.

I was just a little frustrated that the discussion even went there, because I didn't think you were even claiming what they were arguing against. That was happening because I think you left a gap by not addressing it clearly, and wanted to point it out, because people seemed to be taking past each other.

It looks to me like he neatly addressed all concerns that were brought up, even if one does not agree with the solutions he proposes. I don't see any lack of acknowledgment on OP's part.