aws
By default, GPU dose not support resource isolation while multiple containers share one GPU
Nvidia provides the Multi-Process Service (MPS) implementation for the CUDA-compatible API to improve the resource utilization for applications running in parallel. They recently added a new QoS feature in MPS that allows programmers to specify an upper limit on the number of GPU threads available for each application to limit available compute bandwidth on a per-application basis
nvidia
ypervisor based vgpu