Hacker News new | ask | show | jobs
by dbaupp 2894 days ago
No, it isn't computable (that is, correctly determining one of "these functions behave the same" or "these functions behave differently", and not "unknown") in general, as it is equivalent to the halting problem. Consider these two versions of a function, are they API compatible?

  def foo():
    return True

  def foo():
    return halts("some turing machine program")
They're only truly API compatible if the program halts, but the program can be arbitrary, so proving that any 'foo' of this style are equivalent is solving the halting problem.

Of course, one can still likely get useful answers of "definitely incompatible" etc., with much more tractable analyses. AIUI, the Elm version ends up just looking at the function signature, and catches things like removing or adding arguments: for appropriately restricted languages, it is likely to even be possible to determine if downstream code will still compile, but that's not a guarantee it will behave correctly.

4 comments

It's been a while, since I studied this, but linear bounded automata are arguably a better model for the real computers we can actually build and the halting problem is computable for LBAs.
Yeah, that seems true.

That said, in practice I'm not sure it is too useful (this is a slightly different question to the one originally asked, though). My understanding is the proof of decidability is essentially a pigeonhole principle argument based on LBA having a finite number of states: run the LBA for that many steps, if it hasn't halted, then it has looped. Even if a program is running on a system that only allows programs to use 1GB of memory, there's 2^(2^30*8) (approximately 10^(2.6e9)) states.

Essentially, the problem is not with computers, but with languages.

If your language lets you build a neverending loop/datastructure/etgc, then you cannot prove halting for all the programs that can be made by the language.

Since such languages exist, then by extension, it applies to computers.

I wonder if tighter guarantees can be given if tools can work with constructs like D's contracts. These are extra sections in each function that are intended to check invariants. If these invariants change, then they were either broken and needed fixing or the function had a semantic change.
That's an interesting idea. It seems like it would be a way for tools to flag "this function is likely to have changed in an interesting way", but changing invariants doesn't necessarily mean the function breaks semver.

For one, an invariant might be changed syntactically, without actually changing what it is asserting, reducing to exactly my example above: in its simplest form, the contract could go from in(true) to in(halts("...")).

Secondly, an contract could have been made more permissive, e.g. in(x > 1) becoming in(x > 0), or out(y > 0) becoming out(y > 1). Assuming violating a contract is regarded as a programming error (as in, it isn't considered breaking to go from failing an assertion to not failing), these are also non-breaking changes.

Lastly, changing behaviour doesn't necessarily mean changing invariants/contracts.

There can still be a semantic change, even if conditions are unchanged and valid.
D contracts are arbitrary code snippets. So they're Turing-complete, therefore it's still uncomputable.
The halting problem doesn't matter if your API is required to define timeouts / time guarantees. Thus if a#1.0() returns "hello" after 100ms and a#1.1() returns "hello" after 20s, even though they return the same result, they are deemed functionally different. This follows real-world expectations where performance which suddenly slows to the point of unusability is considered breakage, even if the same result is returned in the end.
Replace it with any other nonsense.

In practice your time guarantee isn't going to let you produce a sound and complete static analysis that proves the equivalence of two arbitrary (modulo termination guarantees) functions.

In the real world all programs have a timeout set to the time to the heat death of the universe. This hasn't helped us make sound and complete static analysis.

I think that's essentially the same point as LBA in a sibling comment. The same intractability-in-practice applies to it: even if your timeout is 10 milliseconds, that's still (tens of) millions of operations on most modern CPUs and order of a gigabit written to/read from memory, which is a huge state space.
I think it's more like a lower bound on difficulty. In other words, halting is one of many guarantees that would need to be checked to ensure that an API didn't break, and it is impossible to check, so it cannot be done, irrelevant of the difficulty of the other problems (like timeouts or whatever).
I don't think you understood my comment - you don't need to check for halting if all methods in the API have timeout guarantees. Halting is only a problem if runtime is uncertain. If runtime is certain (again, because of promised performance figures) then it is entirely possible to verify whether an API has been broken or not.
Thanks for that pedantic insight, doc.

It's patently obvious that API compatibility is a computable problem. You are just checking if API1 is a subset of API2.

Even if foo() never returns, the Application Programming Interface is unchanged if the function signature is the same.

Is it just pedantry? Even in the strongliest of practical strongly typed languages, two functions with the same signatures might fail to return on some inputs in the new version that it didn't in the old.

That's a practical distinction I care about that can't be computed.

If you're narrowly interpreting an API to mean just the function signature, then sure, if your type system is appropriately restricted then it is computable (although many type systems are Turing complete, meaning this won't be computable in all cases, for the same reasons). This is, in fact, something my comment explicitly called out.

Having signature compatibility is an important prerequisite, but I think it's possible worse to have code that compiles but does the wrong thing than to have code that doesn't compile (at least it's obvious to the developer that something needs to be fixed, in the latter case).