Hacker News new | ask | show | jobs
by bangaladore 516 days ago
I feel like in pretty much every case here they still do not need arbitrary access. The point of DMA cheating is to make zero modification of the target computer. The moment a driver needs to be used to say allow an IOMMU range for a given device, the target computer has been tainted and you lose much of the benefit of DMA in the first place.

Does a GPU need access to memory of a Usermode application for some reason, okay, the GPU driver should orchestrate that.

> We haven't even gotten into exotic hardware that wants to do some kind of shared memory clustering between machines, or cache cards (something like Optane) which are PCIe cards that can be used as system memory via DMA, or dedicated security processors intended to scan memory for malware etc.

Again, opt-in. The driver should specify explicit ranges when initializing the device.

1 comments

> I feel like in pretty much every case here they still do not need arbitrary access.

Several of those cases do indeed need arbitrary access.

> The moment a driver needs to be used to say allow an IOMMU range for a given device, the target computer has been tainted and you lose much of the benefit of DMA in the first place.

The premise there being that the device is doing something suspicious rather than the same thing that device would ordinarily do if it was present in the machine for innocuous reasons.

> Does a GPU need access to memory of a Usermode application for some reason, okay, the GPU driver should orchestrate that.

Okay, so the GPU has some CPU cores on it and if the usermode application is scheduled on any of those cores -- or could be scheduled on any of them -- then it will need access to that application's entire address space. Which is what happens by default, since they're ordinary CPU cores that just happen to be on the other side of a PCIe bus.

> Again, opt-in. The driver should specify explicit ranges when initializing the device.

What ranges? The security processor is intended to scan every last memory page. The cache card is storing arbitrary memory pages on itself and would need access to arbitrary others because any given page could be transferred to or from the cache at any time. The cluster card is presenting the entire cluster's combined memory as a single address space to every node and managing which pages are stored on which node.

And just to reiterate, it doesn't have to be anything exotic. The storage controller in a common machine is going to do DMA to arbitrary memory pages for swap.

Re everything above the below, you are naming esoteric reasons for allowing unfettered access to physical memory. That's fine, but what percent of players of X game are going to have such a setup in their computer? Not enough that detecting that and preventing you from accessing a server would be a problem.

> And just to reiterate, it doesn't have to be anything exotic. The storage controller in a common machine is going to do DMA to arbitrary memory pages for swap.

I'd like a source for that if you have one. I'd be very surprised if modern IOMMU implementations with paging need arbitrary access. The CPU / OS could presumably modify the IOMMU entries prior to the DMA swap. The OS is still the one initiating a DMA transaction.

> That's fine, but what percent of players of X game are going to have such a setup in their computer?

If the "put some CPU cores on the GPU" thing becomes popular, probably a lot.

> I'd like a source for that if you have one. I'd be very surprised if modern IOMMU implementations with paging need arbitrary access. The CPU / OS could presumably modify the IOMMU entries prior to the DMA swap. The OS is still the one initiating a DMA transaction.

Traditional paging implementations didn't use IOMMU at all -- a lot of machines don't even physically have one, and even the ones that have one, that doesn't mean the OS is using it for that. It might end up going through it if you have something like the storage controller is mapped as a device to a VM guest and then the host uses the IOMMU to map the storage controller's DMA to the memory pages corresponding to what the guest perceives as its physical memory, or things along those lines.

But remapping the pages for each access, even if theoretically possible, would be pretty expensive. Page table operations aren't cheap and have significant synchronization overhead, and to swap a page that way would require you to both map the page and then almost immediately do another operation to unmap it again. For each 4kB page, since they're unlikely to be contiguous. You can do the math on how many page table operations that would add if you were swapping in, say, 500MiB, which a modern SSD could otherwise do in tens of milliseconds. Notice in particular that this would make operating systems that do this get lower scores in benchmarks. And that this applies not just to swap as a result of being out of memory, but ordinary file accesses which are really a swap to the page cache.

You could also run into trouble if you tried to do that because the IOMMU may only support a finite number of mappings, or have performance issues if you create too many. Then you get a slow device with too many pending I/O operations and the whole system locks up.

And even if you paid the cost, what have you bought? The OS could still give a device access to any given memory page for legitimate reasons and you have no way to know if the reason was the legitimate one or the user arranged for those circumstances to exist so they could access the page.