Hacker News new | ask | show | jobs
by chrisper 3461 days ago
How is that even possible with virtual memory and paging? Why would the same video data or whatever go to the same physical location every time?
3 comments

1. The OS likes to fill physical memory with cache, rather than leaving it empty, because modern memories zero quickly enough for the OS to pretend it has "free memory", only actually freeing it on demand.

2. "Clean" mmap(2)ed pages count as "cache" in the above.

3. Memory allocations are usually physically contiguous—when you mmap(2) something, and then read every byte of that thing in order, that thing usually ends up in a contiguous run of physical memory as clean page-cache entries. This will be true to the extent that you have no other memory pressure forcing the page cache to fragment, evict other caches, or overwrite itself.

4. ASLR just juggled virtual-memory, not physical memory. IIRC (not a kernel dev), Linux at least has an allocation strategy that will effectively allocate physical memory serially if there's no contention.

Put 1-4 together, and you get a system where a big mmap(2)ed file that takes up all the physical memory of an otherwise-idle system, will end up putting that thing into the same places each time (because that's the only place such a large allocation will fit.)

I can imagine bunches of possibilities. Paging can make things hard to predict, especially when multiple programs are allocating memory, but it doesn't make the system non-deterministic, nor does it make hitting the same physical address impossible.

One possibility is that he didn't restart the program between retries, and the memory in question was already allocated. Another possibility is that he only ran handbrake and nothing else, and the OS was in more or less the same state both times. It could be that the problem was triggered by stack allocations rather than heap allocations and the video block in question caused a large-ish recursion that hit the problem, and would be likely to hit the problem no matter what was running since it's somewhat rare to have large stack allocations.

Chances are it was actually none of those things, but they're real possibilities anyway.

Maybe my Handbrake installation was broken because of defective RAM - I don't know exactly... anyway: I found the problem was RAM and now it works...
It's actually scary how much (unpredictable and maybe undetectable) stuff can happen due to bad RAM.
I once had a bad RAM socket. I sent back RAM that failed memtest86 and was rather confused when the next set failed in the same way.
Yeah! Bad bit baked into the executable is a strong possibility.
Presuming "frozen PC" means "unresponsive and must be forcibly rebooted, there would be no retrying with the program without restarting.

What with Windows Update and the variety of other similar OS- and application-level auto-updaters, is getting the computer into a very similar state likely? I'm not sure but my gut says no.

That said, at first I was imagining a desktop computer with 4 or 8 memory modules, but given a machine with just 2 modules, maybe it follows that one module usually gets filled with "core stuff" and the second, defective module somewhat infrequently sees "big user stuff" after the first module is filled, and I guess that isn't too much of a head-scratcher when it comes to identifying the source of the problem.

> but it doesn't make the system non-deterministic

Actually, it does. It's possible to calculate WCET involving ram accesses, as the behaviour is deterministic; there's a set latency.

It's not possible whenever SWAP is involved, which is why most of the realtime world simply avoids swap. This is mentioned in the Genode handbook, if you're willing to dig into it more.

I guess I need to clarify I was talking about read/write location determinism, and not timing. Timing could affect things, but it also might not. The question is only whether the same bit was touched, not whether the entire system state is identical in every aspect.

I assumed that swap wasn't involved, and I was even going to mention it but decided against. While it is remotely possible, there's not much reason to suspect swap, Handbrake was actively running, doesn't normally use enough memory to start swapping, and people ripping DVDs usually know not to be doing other things and/or using all their memory while ripping.

That said, are you saying swap in the OS really is non-deterministic by design, or just hard to predict? And what does Genode have to do with this, assuming he wasn't using Genode?

Before ASLR this wasn't unusual at all.