| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bix6 265 days ago
	How real is the risk of information leakage if I’m on a shared GPU with multiple users?

4 comments

throw0101d 265 days ago

Contra another comment: fairly low. (Or at least my search-fu has not been able to find any CVEs or published papers about breaking isolation between MIG instances. MPS should be generally be used only by one user so multiple of their own CUDA apps can attach to one (v)GPU.)

MIG is used a lot in HPC and multi-tenancy cloud, where isolation is important. See Figure 1 and §6.2:

* https://docs.nvidia.com/datacenter/tesla/mig-user-guide/

The card is actually sliced into different instances (show up as different /dev/nvidiaXs), each with their own SMs, L2, and DRAM, that are isolated between each one. (MPS is for the same user to share a GPU instance: allows multiple CUDA apps to attach and time-slicing occurs.)

link

saagarjha 265 days ago

Is anyone actually looking at this platform?

link

throw0101d 265 days ago

> Is anyone actually looking at this platform?

Question unclear: looking at to use (yes: lots in HPC, hypervisors), or looking at from a security POV (don't know)?

link

saagarjha 265 days ago

Yeah I'm talking about the latter

link

LPisGood 265 days ago

I remember a few years ago my hardware security professor suggested we try to implement Rowhammer on GPU. I ended up doing something else, but it looks like someone got there: https://arxiv.org/abs/2507.08166

link

doctorpangloss 265 days ago

MIG is low, the exploit would be exotic.

MPS should only be used where all the workloads trust each other. It is similar to running multiple games on your computer simultaneously.

You cannot use NVLink with MPS or MIG, it is not isolated, and malformed NVLink messages can be authored in userspace and can crash the whole GPU. Some vendors, like Modal, allow you to request NVLink'd shared GPUs anyway.

MIG only makes sense for cloud providers. MPS only makes sense for interactive (read: not ML) workloads. Workloads needing more than 1 GPU cannot use either.

link

woadwarrior01 265 days ago

Very real.

https://www.usenix.org/system/files/usenixsecurity24-guo-yan...

https://www.sciencedirect.com/science/article/pii/S016740482...

link

throw0101d 265 days ago

I do not see MIG mentioned in either paper. I do not think the papers are examining isolation security between instances, which the GP was asking about.

link

woadwarrior01 265 days ago

Yeah, I only posted two links from my notes, from when I was looking at this a few months ago. Here's one on MIG.

https://arxiv.org/abs/2207.11428

link

throw0101d 265 days ago

As per sibling comment, this is about utilization efficiency and not breaking isolation (between MIG instances). The conclusion:

> In this paper, we presented MISO, a technique to leverage the MIG functionality on NVIDIA A100 GPUs to dynamically partition GPU resources among co-located jobs. MISO deploys a learning-based method to quickly find the optimal MIG partition for a given job mix running in MPS. MISO is evaluated using a variety of deep learning workloads and achieves an average job completion time that is lower than the unpartitioned GPU scheme by 49% and is within 10% of the Oracle technique.

link

stygiansonic 265 days ago

That paper doesn’t seem to be about security vulnerabilities in MiG but rather using it to improve workload efficiency

link