Hacker News new | ask | show | jobs
by mrpf1ster 500 days ago
Why would re-using a buffer be bad? Assuming you write to it with the contents of the file/stream before it is read.
2 comments

You just answered your own question
I think they implied you would prevent that.
Why is it particularly more dangerous or likely than other logic errors?
Because the compiler optimizes based on the assumption that consecutive reads yield the same value. Reading from uninitialized memory may violate that assumption and lead to undefined behavior.

(This isn't the theoretical ivory tower kind of UB. Operating systems regularly remap a page that hasn't yet been written to.)

If you read something where you have not written, who cares whether the compiler optimizes things such that if you read from there again, you get the same value, even though that is not true?
Anyone who wants to be able to sanely debug. Code is imperfect, mistakes happen. If the compiler can optimise so that any mistake anywhere in your program could mean insane behaviour anywhere else in your program, then you get, well, C.

(E.g. imagine doing a write to an array at offset x - this is safe in Rust, so the compiler turns that into code that checks that x is within the bounds of that array, then writes at that offset. If the value of x can change, then now this code can overwrite some other variable anywhere in your program, giving you a bug that's very hard to track down)

I see what you're getting at: situations in which the compiler trusts that the location has not changed, but needs to re-load it because the cached value is not available. When the location is reloaded, the security test (like a bounds check) is not re-applied to it, yet the value being trusted is not the one that had been checked.

This is not exactly an optimization though, in the sense that it will mess up even thoroughly unoptimized code (with more likelihood, due to caching optimizations being absent).

So that is to say, even the generation of basic unoptimized intermediate code for a language construct relies on assumptions like that certain quantities will not spontaneously deviate from their last stored value.

That's baked into the code generation template for the construct that someone may well have written by hand. If it is optimization, it is that coder's optimization.

The intermediate code for a checked array access, though, should be indicating that the value of the indexing expression is to be moved into a temporary register. The code which checks the value and performs the access refers to that temporary register. Only if the storage for the temporary registers (the storage to which they are translated by the back end) changes randomly would there be a problem. Like if some dynamically allocated location is used as an array index, e,g. array[foo.i] where foo is a reference to something heap allocated, the compiler cannot emit code which checks the range of foo.i, and then again refers to foo.i in the access. It has to evaluate foo.i to an abstract temporary, and refer to that. In the generated target code, that will be a machine register, or a location on the stack. If the machine register or stack are flaky, all bets are off, sure. But we have been talking about memory that is only flaky until it is written to. The temporary in question is written to!

How common is it for operating systems to do anything other than this:

1. Initially map the not-yet-written page to a read-only page full of zeros (the same one for all allocations: only one exists in the whole system).

2. When a write takes place, copy-on-write clone that page to a newly allocated zero-filled-page, then allow the write to proceed.

The "Giving advice about use of memory" section of the article answers this question directly.
And that's not something you should be depending on a compiler to verify.