Yeah, I’m wondering what this even means. I’m assuming they’ll have to define “memory safety” which is already quite the task. Memory safe in what context? On what sort of machine? What sort of OS?
Just sharing an anecdote: recently, I had to create Linux images for x86 on ARM machine using QEMU. During this process, I discovered that, for example, creation of initrd fails because of memory page size (some code makes assumption about page size and calculates the memory location to access instead of using system interface to discover that location). There's a similar problem when using "locate" utility. Probably a bunch more programs that have been successfully used millions, well, probably trillions times. This manifests itself in QEMU segfaulting when trying to perform these operations.
But, to answer the question: I think, one way to define memory safety is to ensure that the language doesn't have the ability to do I/O to a memory address not obtained through system interface. Not sure if this is too much to ask. Feels like for application development purposes this should be OK, and for system development this obviously will not work (someone has to create the system interface that supplies valid memory addresses).
I think the usual context just requires language soundness; it doesn't depend on having an MMU or anything like that. In particular, protection against:
- out-of-bounds on array read/write
- stack corruption such as overwriting the return address
It doesn't directly say "you can't use C", but achieving this level of soundness in C is quite hard (see sel4 and its Coq proof).
Everyone picks on C, but we have a standard for this. We've been following it for decades in regulated industries. If people take the time, it can be perfectly safe. It requires thinking of a computer as a precision machine, rather than a semantic "do what i'm thinking" box.
Maybe I lack vision in such matters, but: how would you corrupt the stack without an out-of-bounds write?
But there's another aspect that I think you missed: use after free.
As you say, achieving this level of soundness with C is hard. Proving it is much harder. (Except, how do you know you've achieved it if you don't prove it?)
Yet that is not what memory safety means. A program being memory safe or not depends on its actual behaviour not what you can prove about that behaviour. There are plenty of safe C programs and plenty of unsafe ones. Proving something is safe doesnt make it safe.
Also these properties are a very small subset of general correctness. Who cares if you write a "safe" program if it computes the wrong answer?
Not OP but you can in theory add cosmic rays, rowhammer attacks and brownout/undervolt glitching into the mix. Kinda stretching it but sometimes you have to think about these.
Just sharing an anecdote: recently, I had to create Linux images for x86 on ARM machine using QEMU. During this process, I discovered that, for example, creation of initrd fails because of memory page size (some code makes assumption about page size and calculates the memory location to access instead of using system interface to discover that location). There's a similar problem when using "locate" utility. Probably a bunch more programs that have been successfully used millions, well, probably trillions times. This manifests itself in QEMU segfaulting when trying to perform these operations.
But, to answer the question: I think, one way to define memory safety is to ensure that the language doesn't have the ability to do I/O to a memory address not obtained through system interface. Not sure if this is too much to ask. Feels like for application development purposes this should be OK, and for system development this obviously will not work (someone has to create the system interface that supplies valid memory addresses).