There's a reason we don't code in plain English though. Natural language has ambiguities. This is the reason we invented programming languages.
It's best illustrated by the old joke:
A programmer's wife told him "Go to the store and buy milk and if they have eggs, get a dozen." He came back a while later with 12 cartons of milk.
A good chunk of all bugs in software are down to the requirements being insufficiently well specified. Further, many bugs are the discovery of new requirements when informal specification encounters reality.
"Read from standard input into this byte array" doesn't specify what to do when the input exceeds the byte array.
When you overflow the buffer, you get a "well obviously you're supposed to not do that"... that's wasn't stated at all.
When the function keeps going after a newline or a null byte or whatever, there's another "well obviously you're supposed to stop at those points". That was also not specified.
and so on.
At the point you're specifying all these cases and what to do when, it's so specific and stilted, you might as well be using a programming language.
Actually, programming languages were invented because speaking machine code was too much of a pain in the ass! Programming in English is a natural next step. Ambiguity is not an issue -- you keep speaking until its resolved.
(We already program in English, in a sense, when we tell humans what we want, and they go code it. Now we'll just be telling machines.)
This type of well-defined language actually predates computers by several thousand years. Even way back in antiquity they used "programming languages" like these to get around the inherent ambiguities of natural language.
Originally as formulaic syllogisms and Aristotelian logic, but then onto other forms of codified language, formal logic etc.
Adding more words often makes things less clear, not more so. What you need is well-defined terms with no overloaded meaning.
> (We already program in English, in a sense, when we tell humans what we want, and they go code it. Now we'll just be telling machines.)
Humans get it wrong all the time though. A great many bugs arise from quite simply misinterpreting the requirements. Which leads to requirements becoming more formulaic and resembling a programming language.
I disagree that logic or math are the same as a programming language; a programming language is defined by the fact that a machine can execute it.
Plus also most math and logic is still communicated and developed in a mix of human languages (English etc) and ad-hoc, not rigoursly defined notation; it's nowhere near the precision of a programming language.
Though you CAN of course grind it out at that level, if you want, but it's very unwieldy and not how people actually work.
If you're looking to deduce some sort of proof from well-defined principles and you wish to eliminate the possibility of error, then sure well-defined terms (a rigorous language) is useful.
If you're looking to produce a sofwtare artifact, just saying what you want in high-level terms and providing iterative natural language feedback is going to work great and be way nicer than trying to formalize everything.
(Maybe not true for low-level plumbing and things that need to be secure. But for like "build an app", "build a game", "make a shell script that does x", I think it will certainly end up being true.)
Well what you want is one and only one behavior. You're going to need to be specific to the point where what you're specifying is that singular behavior. An interesting example is a binary search, which is easy to informally specify, although it's shockingly difficult to produce a correct implementation. [1]
Given what you want is a singular specific behavior, not just sorta that behavior, but exactly that behavior, it makes sense to use a language that caters to specificity.
As an aside, we've experimented with more natural-language like programming languages before (COBOL, BASIC). They're a pain in the ass because you needed to type a novel to do even trivial things. The terseness of modern programming languages is a feature.
You want one behavior from the perspective of the user's desire, but there will be a very large set of possible programs that will achieve that desire. So you just need to be precise w/r/t desire, not with the nitty gritty behavior. Traditional programming languages are great to expressing the nitty gritty behavior, and bad at capturing the desire.
Don't think of LLMs as akin to COBOL or Basic (or HyperTalk, is another great example) -- those are essentially rigid languages with a sprinkling of "Englishness" to try to make them seem less foreign to novice users. And yes, that just gets in the way once you become proficient in rigid languages.
But that's a completely different beast than programming in actual English.
So for example, the reason binary search (a simple algorithm) is hard to implement is because of the mismatch between easy to understand human desire and the pickiness of the rigid language designed for the machine.
Now, it is very easy to implement a correct binary search.
Tell ChatGPT:
> Write a binary search method in C#
It responds:
public static int Search(int[] arr, int target)
{
int left = 0;
int right = arr.Length - 1;
while (left <= right)
{
int mid = left + (right - left) / 2;
if (arr[mid] == target)
{
return mid;
}
else if (arr[mid] < target)
{
left = mid + 1;
}
else
{
right = mid - 1;
}
}
return -1; // Return -1 if the target is not found
}
We can also ask it:
> Prove it is correct
It responds:
> To prove the correctness of the binary search algorithm, we can use a technique called loop invariant. A loop invariant is a condition that remains true for every iteration of the loop. In this case, the loop invariant is that the target element, if it exists in the array, will always be within the range defined by the left and right pointers.
> Loop Invariant: At the start of every iteration, if the target element exists in the array, it must be within the subarray arr[left, right].
> Initialization: At the beginning, left = 0 and right = arr.Length - 1, which covers the entire array. So, if the target element is present in the array, it must be within the initial range.
> Maintenance: During each iteration, the algorithm examines the middle element (mid = left + (right - left) / 2). There are three cases:
> If arr[mid] == target, the target element is found and returned.
> If arr[mid] < target, the target must be in the right half of the subarray. Therefore, we update left = mid + 1. Since the array is sorted, we are sure the target is in the range arr[mid + 1, right] if it exists in the array.
> If arr[mid] > target, the target must be in the left half of the subarray. Therefore, we update right = mid - 1. Since the array is sorted, we are sure the target is in the range arr[left, mid - 1] if it exists in the array.
> In cases 2 and 3, we have updated either the left or right pointer, maintaining the loop invariant for the next iteration.
> Termination: The loop terminates when left > right. Due to the loop invariant, if the target element exists in the array, it must be in the range arr[left, right]. Since left > right, this range is now empty, which means the target element is not in the array, so the algorithm returns -1.
> In conclusion, the binary search algorithm is correct, as it either returns the index of the target element if it exists in the sorted array, or -1 if it doesn't.
(OK, I'd quibble with that proof a little bit -- it was a little hand-wavy on why the loop must terminate.)
How about the broken binary search you linked to?
Ask ChatGPT:
> Can you find the bug in this java binary search implementation? [Code]
It responds:
> The given binary search implementation appears to be correct, assuming that the input array a is sorted in ascending order. However, there is a potential issue with integer overflow when calculating the middle index mid. To avoid this, you can calculate mid using the following line:
int mid = low + ((high - low) / 2);
So maybe now you can see how English might be a superior way to program.
It's best illustrated by the old joke:
A good chunk of all bugs in software are down to the requirements being insufficiently well specified. Further, many bugs are the discovery of new requirements when informal specification encounters reality."Read from standard input into this byte array" doesn't specify what to do when the input exceeds the byte array.
When you overflow the buffer, you get a "well obviously you're supposed to not do that"... that's wasn't stated at all.
When the function keeps going after a newline or a null byte or whatever, there's another "well obviously you're supposed to stop at those points". That was also not specified.
and so on.
At the point you're specifying all these cases and what to do when, it's so specific and stilted, you might as well be using a programming language.