| > it offers no way to encode general recursion in any form I'd need to look into that a bit more, but at the moment I don't believe that is actually true. Dhall provides anonymous functions, or the basics of lambda calculus. This probably means that I can encode the Y-Combinator in it, resulting in anonymous recursion, though unless functions are first class, it'd by 500+ LOC. Again, I haven't reviewed the internals enough to see, but if you have functions, that's normally enough to build infinite anonymous recursion. Even if it isn't possible for some reason I haven't seen, claiming safety in that fashion probably isn't appropriate. I'm also not certain that it isn't Turing Complete. The C Pre-Processor doesn't supply a way to encode general recursion, but as we can see here [0], it is entirely possible, though to a limited depth. Leading to some debate as to whether or not it is Turing Complete. If a configuration program created by Dhall is not guaranteed to terminate in a timely manner, then you completely lose the safety there. Reduces to a normal fashion is completely useless as a guarantee if it takes 10 years, and is one of the reasons that simple markup/notation forms like YAML and JSON that are readable in O(1)/O(n) time end up preferred, rather than something that can explode into O(n^2) time. [0] https://news.ycombinator.com/item?id=4795542 |
Try writing that function. It has a type like '(a -> a) -> a', which you need to implement. You should try it, but I'll spoil it for you: Dhall has no recursive types, nor does it have recursive values. The recursive Y-combinator cannot be written in Dhall itself, it must be provided by the implementation. You can write this in Haskell using open recursion/data types, but only because Haskell allows equirecursive types, and it's already inconsistent anyway. Dhall has neither of these properties: no form of recursion exists at all.
This is a property of System F, the core itself, which is quite well established.
That said, some weird bug in the implementation itself may cause some weird behavior (like most software). But System F itself? It's not turing complete and this is well known for a very long time as a mathematical result. And having written my own implementation of the Calculus of Constructions: it's quite easy to get the core logic correctly implemented. In fact, with a few abstractions, you can correctly implement the Calculus of Constructions -- which is even more powerful than Dhall -- in about ~50 lines of Haskell code.
> Again, I haven't reviewed the internals enough to see, but if you have functions, that's normally enough to build infinite anonymous recursion.
No, it isn't. Functions have nothing to do with it, at all. In the world of lambda calculus/natural deduction/logic, the relevant bit is not functions, it is the question: can I construct any possible value, out of thin air? Or, in other words, can I create a value of type 'forall a. a', somehow? Can I create a paradox -- a logical inconsistency? In the unified world of programming languages & logic, "general recusion" is actually a form of logical inconsistency inside the system, which allows you to prove anything -- of the kind Godel described.
This is, roughly speaking, the equivalent process to determining some logical system like ZFC is actually inconsistent: such as proving `True = False` inside ZFC with nothing else. If you can create that proof out of nothing but the basics, you've created something that shouldn't exist. It is a contradiction. If you can do this, you have proven inconsistency in the underlying theory -- because you can prove anything if you already have proven 'True = False' https://en.wikipedia.org/wiki/Principle_of_explosion
And, really, it is not "roughly" the same thing -- it is the exact same thing. To find a value of type 'forall a. a', in fact, is the exact same principle as proving or showing a paradox/inconsistency in an ordinary logic.
If what you said was true, that "functions lead to turing completeness", then the existence of modus ponens -- modus ponens being the logical equivalent of "function type" in the lambda calculus -- would imply that systems like ZFC are inconsistent. But modus ponens and inconsistency have nothing to do with each other, and the fact a theory contains Modus Ponens as a valid judgement rule does NOT mean it is inconsistent.
For example, the simply typed lambda calculus has functions. But it does not have any form of polymorphism, or type variables. Thus, the general Y combinator is untypeable in this language and cannot be written, and the STLC is not Turing complete: you cannot write a function of type 'forall a. (a -> a) -> a' inside the language. It's not even possible to abstract over types, because there are no type variables! Picking a specific type 'a' means nothing at all: '(Bool -> Bool) -> Bool' is extremely easy to implement and not inconsistent in any way ('fix k = k True', done, and perfectly consistent).
It's the "for every 'a'..." part, along with the ability to deduce 'forall a' (somehow, someway), that leads to general recursion. If you want the STLC to have recursion, your compiler must implement a magic version of 'fix' on its own.
> I'm also not certain that it isn't Turing Complete.
You should be, because it isn't. There's ample research on the design of typed programming languages, the Lambda Cube, System F, and turing completeness.
A good book to start with, just for the design of these systems, is "Types and Programming Languages" by Benjamin Pierce.
> The C Pre-Processor doesn't supply a way to encode general recursion, but as we can see here [0], it is entirely possible, though to a limited depth. Leading to some debate as to whether or not it is Turing Complete.
No, it doesn't lead to anything. The C preprocessor isn't turing complete. That's it. There is no scientific debate to be had about this, anymore than there is a scientific "debate" about what temperature water boils at. The replies to the post you quoted illustrate this directly and express the difference. Whether or not "it can recurse 10,000,000 if I allow it to do so by passing a flag, and 10,000,001 times if I changed the number 10,000,000 to 10,000,0001" is rather besides the point of whether it's Turing complete.
> Reduces to a normal fashion is completely useless as a guarantee if it takes 10 years, and is one of the reasons that simple markup/notation forms like YAML and JSON that are readable in O(1)/O(n) time end up preferred, rather than something that can explode into O(n^2) time.
Correct. This is why being able to ask if any configuration is normalized is important (and asking this can be done quickly by simply seeing if there are any unreduced terms, rather than trying to actively reduce them). But let's not pretend YAML and JSON don't have their deficiencies in similar realms -- O(1) is nice and all, yet YAML is actually relatively complex to the point of introducing security vulnerabilities, and I've encountered my fare share of deficiencies/inconsistencies in JSON/YAML parsers.
You must also consider the usage scenarios. Purely for configuration, this is less likely to matter: you're unlikely to write a purposefully inconsistent/bizarre YAML file that fucks up your parser, much like you're unlikely to write a Dhall configuration which takes 10 years to reduce when evaluated. But when you are accepting arbitrary, untrusted inputs - YAML parsing security matters, just like Dhall evaluation time, and the ability to ask "is this normal form", does.
In most cases I don't see the differences or threat vectors being truly, meaningfully distinct, at all. If you trust your input sources, given the general way configuration files are used -- the evaluation time isn't really a problem. YAML O(1) or whatever doesn't really matter much either; it's generally a fixed cost anyway. But if you don't trust your input sources -- YAML should scare you just as much as Dhall.
That said, I don't think Dhall will offer a substantial benefit for most people who are already using YAML. So it's not like you should switch. To me it's sorta neat, but probably something I'd never use extensively just due to adoption factors.
But I'm afraid you're deeply mistaken on the design of the language, and don't seem to know much about programming language theory, at a glance. I don't blame you, since it's a rather dry field. But, to put it in perspective: "System F and the Lambda Cube alone are not turing complete, and always normalize" is so trivially obvious in the PLT world that if you asked someone to "prove it" to you, they would probably say "It's true by definition of the system itself, what are you even asking" before staring at you blankly (or, they realize you don't know the field, and that they're going to have to start from square 1, as far as computability/language theory goes, to explain why this is the case.) It's an old and well accepted result.
TAPL, the book I mentioned earlier is a good book to introduce yourself to many concepts like this and should help clear up any confusion you might have.