Contra another comment: fairly low. (Or at least my search-fu has not been able to find any CVEs or published papers about breaking isolation between MIG instances. MPS should be generally be used only by one user so multiple of their own CUDA apps can attach to one (v)GPU.)
MIG is used a lot in HPC and multi-tenancy cloud, where isolation is important. See Figure 1 and §6.2:
The card is actually sliced into different instances (show up as different /dev/nvidiaXs), each with their own SMs, L2, and DRAM, that are isolated between each one. (MPS is for the same user to share a GPU instance: allows multiple CUDA apps to attach and time-slicing occurs.)
I remember a few years ago my hardware security professor suggested we try to implement Rowhammer on GPU. I ended up doing something else, but it looks like someone got there: https://arxiv.org/abs/2507.08166
MPS should only be used where all the workloads trust each other. It is similar to running multiple games on your computer simultaneously.
You cannot use NVLink with MPS or MIG, it is not isolated, and malformed NVLink messages can be authored in userspace and can crash the whole GPU. Some vendors, like Modal, allow you to request NVLink'd shared GPUs anyway.
MIG only makes sense for cloud providers. MPS only makes sense for interactive (read: not ML) workloads. Workloads needing more than 1 GPU cannot use either.
I do not see MIG mentioned in either paper. I do not think the papers are examining isolation security between instances, which the GP was asking about.
As per sibling comment, this is about utilization efficiency and not breaking isolation (between MIG instances). The conclusion:
> In this paper, we presented MISO, a technique to leverage the MIG
functionality on NVIDIA A100 GPUs to dynamically partition GPU
resources among co-located jobs. MISO deploys a learning-based
method to quickly find the optimal MIG partition for a given job
mix running in MPS. MISO is evaluated using a variety of deep
learning workloads and achieves an average job completion time
that is lower than the unpartitioned GPU scheme by 49% and is
within 10% of the Oracle technique.
MIG is used a lot in HPC and multi-tenancy cloud, where isolation is important. See Figure 1 and §6.2:
* https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
The card is actually sliced into different instances (show up as different /dev/nvidiaXs), each with their own SMs, L2, and DRAM, that are isolated between each one. (MPS is for the same user to share a GPU instance: allows multiple CUDA apps to attach and time-slicing occurs.)