DRA SR-IOV Driver
Dynamic Resource Allocation (DRA) is a Kubernetes concept for flexibly requesting, configuring, and sharing specialized devices like SR-IOV network interfaces. DRA puts device configuration and scheduling into the hands of device vendors through drivers such as the DRA Driver for SR-IOV. This page outlines how to install the NVIDIA DRA Driver for SR-IOV with the NVIDIA Network Operator.
Before using the DRA Driver for SR-IOV, it is recommended that you are familiar with the following concepts:
Overview
With DRA Driver for SR-IOV, your Kubernetes workload can allocate and consume SR-IOV Virtual Functions (VFs) from supported NVIDIA network adapters using the native Kubernetes DRA framework.
You can use the DRA Driver for SR-IOV with the SR-IOV Network Operator to deploy and manage your SR-IOV network resources.
Limitations
Warning
This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.
Warning
On GB300, Vera Rubin, and Fractal systems, the PCIe root used to match a NIC to a GPU is not the root of the NIC itself. Instead, it is the PCIe root of the NIC’s Data Direct sub-interface. This applies to ConnectX-8 and later adapters. The DRA SR-IOV driver does not currently support this topology.
Deployment
Warning
Running the DRA driver and the SR-IOV device plugin on the same cluster at the same time is not supported.
When DRA is enabled, the SR-IOV device plugin will not run. It is recommended to delete any
existing SriovNetworkNodePolicy resources before enabling DRA.
First install the Network Operator with NFD, SR-IOV Network Operator, and DRA enabled:
values.yaml:
nfd:
enabled: true
sriovNetworkOperator:
enabled: true
sriovOperatorConfig:
featureGates:
dynamicResourceAllocation: true
Disable the SR-IOV Resources Injector to avoid conflicts with the DRA Driver for SR-IOV:
kubectl patch sriovoperatorconfig default -n nvidia-network-operator --type merge -p '{"spec":{"enableInjector":false}}'
Step 1: Create NicClusterPolicy
apiVersion: mellanox.com/v1alpha1
kind: NicClusterPolicy
metadata:
name: nic-cluster-policy
spec:
nvIpam:
image: nvidia-k8s-ipam
repository: nvcr.io/nvstaging/mellanox
version: network-operator-v26.4.0-beta.9
enableWebhook: false
secondaryNetwork:
cniPlugins:
image: plugins
repository: nvcr.io/nvstaging/mellanox
version: network-operator-v26.4.0-beta.9
multus:
image: multus-cni
repository: nvcr.io/nvstaging/mellanox
version: network-operator-v26.4.0-beta.9
kubectl apply -f nicclusterpolicy.yaml
Step 2: Create IPPool for nv-ipam
apiVersion: nv-ipam.nvidia.com/v1alpha1
kind: IPPool
metadata:
name: sriov-pool
namespace: nvidia-network-operator
spec:
subnet: 192.168.2.0/24
perNodeBlockSize: 50
gateway: 192.168.2.1
kubectl apply -f ippool.yaml
Step 3: Configure SR-IOV
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ethernet-sriov
namespace: nvidia-network-operator
spec:
deviceType: netdevice
mtu: 1500
nodeSelector:
feature.node.kubernetes.io/pci-15b3.present: "true"
nicSelector:
vendor: "15b3"
isRdma: true
numVfs: 8
priority: 90
resourceName: sriov_resource
kubectl apply -f sriovnetworknodepolicy.yaml
Wait for the SriovNetworkNodeState CRs to reach the Synced state:
kubectl get sriovnetworknodestates -n nvidia-network-operator
Verify that ResourceSlices are created:
kubectl get resourceslices
The following is an example of a ResourceSlice created by the DRA SR-IOV driver, showing a single
Virtual Function with its attributes:
apiVersion: resource.k8s.io/v1
kind: ResourceSlice
metadata:
generateName: c-237-177-60-062-sriovnetwork.k8snetworkplumbingwg.io-
name: c-237-177-60-062-sriovnetwork.k8snetworkplumbingwg.io-t4mc5
ownerReferences:
- apiVersion: v1
controller: true
kind: Node
name: c-237-177-60-062
spec:
devices:
- attributes:
dra.net/numaNode:
int: 0
resource.kubernetes.io/pciBusID:
string: "0000:08:00.4"
resource.kubernetes.io/pcieRoot:
string: pci0000:00
sriovnetwork.k8snetworkplumbingwg.io/EswitchMode:
string: legacy
sriovnetwork.k8snetworkplumbingwg.io/PFName:
string: eth2
sriovnetwork.k8snetworkplumbingwg.io/deviceID:
string: 101e
sriovnetwork.k8snetworkplumbingwg.io/linkType:
string: ethernet
sriovnetwork.k8snetworkplumbingwg.io/parentPciAddress:
string: "0000:00:00.0"
sriovnetwork.k8snetworkplumbingwg.io/pciAddress:
string: "0000:08:00.4"
sriovnetwork.k8snetworkplumbingwg.io/pfDeviceID:
string: 101d
sriovnetwork.k8snetworkplumbingwg.io/vendor:
string: 15b3
sriovnetwork.k8snetworkplumbingwg.io/vfID:
int: 2
k8s.cni.cncf.io/resourceName:
string: nvidia.com/sriov_resource
k8s.cni.cncf.io/deviceId:
string: "0000:08:00.4"
name: 0000-08-00-4
Step 4: Create SR-IOV Network
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: sriov-rdma-network
namespace: nvidia-network-operator
spec:
ipam: |
{
"type": "nv-ipam",
"poolName": "sriov-pool"
}
networkNamespace: default
resourceName: sriov_resource
kubectl apply -f sriovnetwork.yaml
Step 5: Create ResourceClaimTemplate
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: sriov-vf
spec:
spec:
devices:
requests:
- name: vf
exactly:
deviceClassName: sriovnetwork.k8snetworkplumbingwg.io
count: 1
selectors:
- cel:
expression: >
device.attributes["k8s.cni.cncf.io"].resourceName == "nvidia.com/sriov_resource"
kubectl apply -f resourceclaimtemplate.yaml
Step 6: Deploy test workload
---
apiVersion: v1
kind: Pod
metadata:
name: sriov-rdma-server
namespace: default
labels:
app: sriov-rdma
role: server
annotations:
k8s.v1.cni.cncf.io/networks: sriov-rdma-network
spec:
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
restartPolicy: Never
containers:
- name: rdma-test
image: nvcr.io/nvidia/doca/doca:3.1.0-full-rt-host
command: ["/bin/bash", "-c", "sleep infinity"]
securityContext:
capabilities:
add: ["IPC_LOCK"]
privileged: true
resources:
claims:
- name: vf
resourceClaims:
- name: vf
resourceClaimTemplateName: sriov-vf
---
apiVersion: v1
kind: Pod
metadata:
name: sriov-rdma-client
namespace: default
labels:
app: sriov-rdma
role: client
annotations:
k8s.v1.cni.cncf.io/networks: sriov-rdma-network
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: role
operator: In
values:
- server
topologyKey: kubernetes.io/hostname
restartPolicy: Never
containers:
- name: rdma-test
image: nvcr.io/nvidia/doca/doca:3.1.0-full-rt-host
command: ["/bin/bash", "-c", "sleep infinity"]
securityContext:
capabilities:
add: ["IPC_LOCK"]
privileged: true
resources:
claims:
- name: vf
resourceClaims:
- name: vf
resourceClaimTemplateName: sriov-vf
kubectl apply -f pod.yaml
Resource Alignment
DRA enables end users to select resources from different DRA drivers with matching attributes to achieve
maximum performance. By using constraints with matchAttribute, the Kubernetes scheduler ensures that
allocated devices share a common topology, such as the same PCIe root complex.
The following example shows a ResourceClaimTemplate that requests both an SR-IOV VF and a GPU
from the NVIDIA DRA Driver for GPUs,
constrained to share the same PCIe root:
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: resource-alignment
spec:
spec:
devices:
requests:
- name: vf
exactly:
deviceClassName: sriovnetwork.k8snetworkplumbingwg.io
selectors:
- cel:
expression: >
device.attributes["k8s.cni.cncf.io"].resourceName == "nvidia.com/sriov_resource"
- name: gpu
exactly:
deviceClassName: gpu.nvidia.com
count: 1
constraints:
- matchAttribute: "resource.kubernetes.io/pcieRoot"
requests: [vf, gpu]