DRA SR-IOV Driver
Dynamic Resource Allocation (DRA) is a Kubernetes concept for flexibly requesting, configuring, and sharing specialized devices like SR-IOV network interfaces. DRA puts device configuration and scheduling into the hands of device vendors through drivers such as the DRA Driver for SR-IOV. This page outlines how to install the NVIDIA DRA Driver for SR-IOV with the NVIDIA Network Operator.
Before using the DRA Driver for SR-IOV, it is recommended that you are familiar with the following concepts:
Overview
With DRA Driver for SR-IOV, your Kubernetes workload can allocate and consume SR-IOV Virtual Functions (VFs) from supported NVIDIA network adapters using the native Kubernetes DRA framework.
You can use the DRA Driver for SR-IOV with the SR-IOV Network Operator to deploy and manage your SR-IOV network resources.
Deployment
Warning
Running the DRA driver and the SR-IOV device plugin on the same cluster at the same time is not supported.
When DRA is enabled, the SR-IOV device plugin will not run. It is recommended to delete any
existing SriovNetworkNodePolicy resources before enabling DRA.
First install the Network Operator with NFD, SR-IOV Network Operator, and DRA enabled:
values.yaml:
nfd:
enabled: true
sriovNetworkOperator:
enabled: true
sriovOperatorConfig:
featureGates:
dynamicResourceAllocation: true
Step 1: Create NicClusterPolicy
apiVersion: mellanox.com/v1alpha1
kind: NicClusterPolicy
metadata:
name: nic-cluster-policy
spec:
nvIpam:
image: nvidia-k8s-ipam
repository: nvcr.io/nvstaging/mellanox
version: network-operator-v26.4.0-beta.1
enableWebhook: false
secondaryNetwork:
cniPlugins:
image: plugins
repository: nvcr.io/nvstaging/mellanox
version: network-operator-v26.4.0-beta.1
multus:
image: multus-cni
repository: nvcr.io/nvstaging/mellanox
version: network-operator-v26.4.0-beta.1
kubectl apply -f nicclusterpolicy.yaml
Step 2: Create IPPool for nv-ipam
apiVersion: nv-ipam.nvidia.com/v1alpha1
kind: IPPool
metadata:
name: sriov-pool
namespace: nvidia-network-operator
spec:
subnet: 192.168.2.0/24
perNodeBlockSize: 50
gateway: 192.168.2.1
kubectl apply -f ippool.yaml
Step 3: Configure SR-IOV
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ethernet-sriov
namespace: nvidia-network-operator
spec:
deviceType: netdevice
mtu: 1500
nodeSelector:
feature.node.kubernetes.io/pci-15b3.present: "true"
nicSelector:
vendor: "15b3"
isRdma: true
numVfs: 8
priority: 90
resourceName: sriov_resource
kubectl apply -f sriovnetworknodepolicy.yaml
Wait for the SriovNetworkNodeState CRs to reach the Synced state:
kubectl get sriovnetworknodestates -n nvidia-network-operator
Verify that ResourceSlices are created:
kubectl get resourceslices
The following is an example of a ResourceSlice created by the DRA SR-IOV driver, showing a single
Virtual Function with its attributes:
apiVersion: resource.k8s.io/v1
kind: ResourceSlice
metadata:
generateName: c-237-177-60-062-sriovnetwork.k8snetworkplumbingwg.io-
name: c-237-177-60-062-sriovnetwork.k8snetworkplumbingwg.io-t4mc5
ownerReferences:
- apiVersion: v1
controller: true
kind: Node
name: c-237-177-60-062
spec:
devices:
- attributes:
dra.net/numaNode:
int: 0
resource.kubernetes.io/pciBusID:
string: "0000:08:00.4"
resource.kubernetes.io/pcieRoot:
string: pci0000:00
sriovnetwork.k8snetworkplumbingwg.io/EswitchMode:
string: legacy
sriovnetwork.k8snetworkplumbingwg.io/PFName:
string: eth2
sriovnetwork.k8snetworkplumbingwg.io/deviceID:
string: 101e
sriovnetwork.k8snetworkplumbingwg.io/linkType:
string: ethernet
sriovnetwork.k8snetworkplumbingwg.io/parentPciAddress:
string: "0000:00:00.0"
sriovnetwork.k8snetworkplumbingwg.io/pciAddress:
string: "0000:08:00.4"
sriovnetwork.k8snetworkplumbingwg.io/pfDeviceID:
string: 101d
sriovnetwork.k8snetworkplumbingwg.io/vendor:
string: 15b3
sriovnetwork.k8snetworkplumbingwg.io/vfID:
int: 2
k8s.cni.cncf.io/resourceName:
string: nvidia.com/sriov_resource
k8s.cni.cncf.io/deviceId:
string: "0000:08:00.4"
name: 0000-08-00-4
Step 4: Create SR-IOV Network
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: sriov-rdma-network
namespace: nvidia-network-operator
spec:
ipam: |
{
"type": "nv-ipam",
"poolName": "sriov-pool"
}
networkNamespace: default
resourceName: sriov_resource
kubectl apply -f sriovnetwork.yaml
Step 5: Create ResourceClaimTemplate
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: sriov-vf
spec:
spec:
devices:
requests:
- name: vf
exactly:
deviceClassName: sriovnetwork.k8snetworkplumbingwg.io
count: 1
selectors:
- cel:
expression: >
device.attributes["k8s.cni.cncf.io"].resourceName == "nvidia.com/sriov_resource"
kubectl apply -f resourceclaimtemplate.yaml
Step 6: Deploy test workload
---
apiVersion: v1
kind: Pod
metadata:
name: sriov-rdma-server
namespace: default
labels:
app: sriov-rdma
role: server
annotations:
k8s.v1.cni.cncf.io/networks: sriov-rdma-network
spec:
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
restartPolicy: Never
containers:
- name: rdma-test
image: nvcr.io/nvidia/doca/doca:3.1.0-full-rt-host
command: ["/bin/bash", "-c", "sleep infinity"]
securityContext:
capabilities:
add: ["IPC_LOCK"]
privileged: true
resources:
claims:
- name: vf
resourceClaims:
- name: vf
resourceClaimTemplateName: sriov-vf
---
apiVersion: v1
kind: Pod
metadata:
name: sriov-rdma-client
namespace: default
labels:
app: sriov-rdma
role: client
annotations:
k8s.v1.cni.cncf.io/networks: sriov-rdma-network
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: role
operator: In
values:
- server
topologyKey: kubernetes.io/hostname
restartPolicy: Never
containers:
- name: rdma-test
image: nvcr.io/nvidia/doca/doca:3.1.0-full-rt-host
command: ["/bin/bash", "-c", "sleep infinity"]
securityContext:
capabilities:
add: ["IPC_LOCK"]
privileged: true
resources:
claims:
- name: vf
resourceClaims:
- name: vf
resourceClaimTemplateName: sriov-vf
kubectl apply -f pod.yaml
Resource Alignment
DRA enables end users to select resources from different DRA drivers with matching attributes to achieve
maximum performance. By using constraints with matchAttribute, the Kubernetes scheduler ensures that
allocated devices share a common topology, such as the same PCIe root complex.
The following example shows a ResourceClaimTemplate that requests both an SR-IOV VF and a GPU
from the NVIDIA DRA Driver for GPUs,
constrained to share the same PCIe root:
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: resource-alignment
spec:
spec:
devices:
requests:
- name: vf
exactly:
deviceClassName: sriovnetwork.k8snetworkplumbingwg.io
selectors:
- cel:
expression: >
device.attributes["k8s.cni.cncf.io"].resourceName == "nvidia.com/sriov_resource"
- name: gpu
exactly:
deviceClassName: gpu.nvidia.com
count: 1
constraints:
- matchAttribute: "resource.kubernetes.io/pcieRoot"
Extended Resource Allocation by DRA
Note
This is an alpha feature in Kubernetes v1.34 (disabled by default). It requires enabling the
DRAExtendedResource feature gate in the kube-apiserver, kube-scheduler, and kubelet.
See Enable Or Disable Feature Gates
for instructions on how to enable feature gates in your cluster.
Extended resource allocation by DRA allows a DeviceClass to specify an extendedResourceName.
The scheduler then selects DRA devices matching the class for extended resource requests, enabling
pods to use the familiar resources.requests syntax to request DRA-managed devices without
explicitly creating a ResourceClaim.
This means existing workloads that use extended resources (e.g. via the device plugin) can seamlessly migrate to DRA.
Note
The SR-IOV Network Operator automatically creates a DeviceClass that can be used with this feature.
The following is an example of an auto-created DeviceClass:
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: sriov-resource
spec:
selectors:
- cel:
expression: >
device.driver == 'sriov.k8snetworkplumbingwg.io' &&
device.attributes["k8s.cni.cncf.io"].resourceName == "nvidia.com/sriov_resource"
extendedResourceName: nvidia.com/sriov_resource
Users can also use the special extended resource name prefix deviceclass.resource.kubernetes.io/
followed by the DeviceClass name. This works for any DeviceClass, even without a configured
extendedResourceName. The resulting ResourceClaim will contain a request for the specified
number of devices of that class.
For more information, refer to the Kubernetes DRA documentation.