| > Marsha does not fall into this, since it can already generate working code, but the bar for what is a programming language, I believe, is lower than most would immediately think. Well, we can go back and forth about the technical definition of individual words all day, but 'is it a programming language?' is such a vague question, the argument is basically meaningless. Do you want to put that label on it? Ok. Someone else disagrees? Huh. Someone called something else a programming language? Someone disagreed with that? eh... Since it's purely opinion based, who cares? There's no answer which is 'right'. I would argue that regardless of semantic details about terminology, there is a fundamental difference between what you're doing here and most common programming languages: You can have: 1) A series of instructions to do a task, which can be unambiguously mapped into a series of instructions in another format. or 2) A series of instructions to do a task, which is mapped non-deterministically into a series of instructions in another format. Just like you have functions (deterministic) and probability functions (non-deterministic), there is a difference here between those two things. ... In this case, you're basically generating non-deterministic imperative logic; that's obviously and unambiguously distinct from a deterministic sequence of imperative logic. It is novel; it is interesting. ...but I don't think it's worth the argument about 'is it a programming language'; it's clearly very different from existing languages. > improving the ability of the reader to understand the core of the algorithm, rather than getting bogged down in the implementation details of this or that programming language. There is still a formalism, but it differs from that of traditional programming languages, and Marsha lets you work with your computer in a similar way. I applaud this intent, but I'm skeptical. Once again, you are non-deterministically mapping the 'core logic' of the algorithmic into a sequence of deterministic steps that may or may not match the request. That's the point; it's non-deterministic. It could do anything; the P value of it doing something crazy might drop, but it's not zero; and fundamentally, how can you rely on a system where the instructions you give may or may not map to the machine code output? You add tests? Sure... but, those are generated too right? You have to dance through a series of tighter and tighter hoops to try to reduce the P value of "crazy hallucination and chaos", but I see no meaningful insight here about how you plan to mitigate that problem completely? ...and if you don't mitigate it completely, unlike a constraint solver, the non-deterministic output you get cannot be validated to be correct... It's not about specifying the syntax in a different more readable form; it's about confidence that the output matches the constraints of the input; and I don't see that here. Given the context length (and nature of large contexts in general) in LLMs, I also ponder whether it's even possible to do this beyond the trivial form, because it seems like as the constraint set scales, the capability of any LLM to address those constraints (and to be confident that it has) seems like a difficult problem to solve. However, I would like to say that I see this domain as an interesting area of research; and most certainly neither a) a solved problem, or b) a dead end. There's definitely stuff here worth playing with and exploring. ...regardless of if people think of it 'as a programming language', or not. |
> 1) A series of instructions to do a task, which can be unambiguously mapped into a series of instructions in another format.
> or
> 2) A series of instructions to do a task, which is mapped non-deterministically into a series of instructions in another format.
You're a bit too black-and-white on this situation. Floating point calculations often suffer subtle differences in behavior based on optimization flags[1] or CPU architecture[2]. I would not consider C non-deterministic, but this is a situation where differences show up without changes to the code being compiled.
> It could do anything; the P value of it doing something crazy might drop, but it's not zero; and fundamentally, how can you rely on a system where the instructions you give may or may not map to the machine code output?
>
> You add tests? Sure... but, those are generated too right?
>
> You have to dance through a series of tighter and tighter hoops to try to reduce the P value of "crazy hallucination and chaos", but I see no meaningful insight here about how you plan to mitigate that problem completely?
1. Marsha in the here-and-now definitely does have a small probability of actually generating junk tests that it can also somehow generate working code for.
2. Different tools for different scenarios, so if that is a huge problem, don't use Marsha as it currently is.
3. Since LLMs are trained on human generated code explained by humans, the code it generates is human readable, so you can always review the generated output before you rely on it, right now. Human still in the loop, but the amount of work significantly reduced.
4. Trivially, you can get determinism in output by setting temperature to 0, though that also means if it fails to generate an output it will always fail to generate an output.
5. A fully predictable output requires a formalism essentially equal to existing programming languages. The purpose of Marsha is to explore relaxing that for development velocity and simplicity, but it is intended to be a gradient you can choose from so I agree it should be possible. Nothing solid figured out now, but simply dropping into your target language of choice would be an "easy" patch, though it defeats the purpose of the language. Something like Rust/Haskell pattern matching or Lean/Coq constraints informing you of missing definitions would be better, but honestly unsure how to get there.
> Given the context length (and nature of large contexts in general) in LLMs, I also ponder whether it's even possible to do this beyond the trivial form, because it seems like as the constraint set scales, the capability of any LLM to address those constraints (and to be confident that it has) seems like a difficult problem to solve.
This one doesn't seem as hard to me, because of the divide-and-conquer nature of programming. Each individual function gets its own context, and if that function is too big, break it into chunks and generate those independently. Definitely more of a Lisp-y style instead of a big blob of old school PHP.
May also become effectively irrelevant if useful context size exceeds the length of something massive like a novel.
> However, I would like to say that I see this domain as an interesting area of research; and most certainly neither a) a solved problem, or b) a dead end. There's definitely stuff here worth playing with and exploring.
Fully agree. Marsha of today doesn't "solve" it (and depending on your acceptance level of C floating point changes, may never do so) but I say pretty confidently that it is further along than Copilot, and I don't see why it won't improve in the future.
[1]: https://stackoverflow.com/questions/7517588/different-floati... [2]: https://stackoverflow.com/questions/64036879/differing-float...