OpenShift Virtualization SR-IOV Integration
On this page
This guide explains how to attach SR-IOV Virtual Functions (VFs) to OpenShift Virtualization virtual machines using VFIO PCI passthrough.
With deviceType: vfio-pci, a VF’s PCI device is passed directly into the guest VM via the VFIO userspace interface.
The VM gets near-native network performance because the data path bypasses the host kernel entirely.
For more information about OpenShift Virtualization, please refer to the OpenShift Virtualization.
Prerequisites
OpenShift Virtualization installed on the cluster:
Red Hat OpenShift: OpenShift Virtualization installation guide
IOMMU enabled on worker nodes:
Intel: kernel parameter
intel_iommu=on iommu=ptAMD: kernel parameter
amd_iommu=on iommu=pt
vfio-pcikernel module available on worker nodesvirtctlCLI tool installed:Red Hat OpenShift: Installing virtctl on OpenShift
Node Feature Discovery
To enable Node Feature Discovery, please follow the official guide. A single instance of Node Feature Discovery is expected to be used in the cluster.
An example of Node Feature Discovery configuration:
apiVersion: nfd.openshift.io/v1
kind: NodeFeatureDiscovery
metadata:
name: nfd-instance
namespace: openshift-nfd
spec:
operand:
image: registry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.16
imagePullPolicy: Always
workerConfig:
configData: |
sources:
pci:
deviceClassWhitelist:
- "02"
- "03"
- "0200"
- "0207"
deviceLabelFields:
- vendor
Verify that the following label is present on the nodes containing NVIDIA networking hardware: feature.node.kubernetes.io/pci-15b3.present=true
For more details please read official NFD documentation.
oc describe node | grep -E 'Roles|pci' | grep -v "control-plane"
Roles: worker
cpu-feature.node.kubevirt.io/invpcid=true
cpu-feature.node.kubevirt.io/pcid=true
feature.node.kubernetes.io/pci-102b.present=true
feature.node.kubernetes.io/pci-10de.present=true
feature.node.kubernetes.io/pci-10de.sriov.capable=true
feature.node.kubernetes.io/pci-14e4.present=true
feature.node.kubernetes.io/pci-15b3.present=true
feature.node.kubernetes.io/pci-15b3.sriov.capable=true
Roles: worker
cpu-feature.node.kubevirt.io/invpcid=true
cpu-feature.node.kubevirt.io/pcid=true
feature.node.kubernetes.io/pci-102b.present=true
feature.node.kubernetes.io/pci-10de.present=true
feature.node.kubernetes.io/pci-10de.sriov.capable=true
feature.node.kubernetes.io/pci-14e4.present=true
feature.node.kubernetes.io/pci-15b3.present=true
feature.node.kubernetes.io/pci-15b3.sriov.capable=true
SR-IOV Network Operator
If you are planning to use SR-IOV, follow these instructions to install SR-IOV Network Operator on an OpenShift Container Platform.
Warning
The SR-IOV resources created will have the openshift.io prefix.
Warning
SR-IOV Network Operator configuration documentation can be found on the Official Website.
Step 1: Create an SriovNetworkNodePolicy
Configure VFs with deviceType: vfio-pci. The operator creates the VFs and binds them to the vfio-pci driver, making them available as allocatable extended resources on the node.
Set isRdma: false (RDMA is not compatible with vfio-pci). The guest VM must have the mlx5_core kernel module available.
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: kubevirt-policy
namespace: openshift-sriov-network-operator
spec:
resourceName: kubevirt_sriov
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 8
nicSelector:
vendor: "15b3"
pfNames:
- ens1f0
deviceType: vfio-pci
isRdma: false
Wait for the policy to be applied:
oc get sriovnetworknodestates -n openshift-sriov-network-operator -o jsonpath='{.items[*].status.syncStatus}'
The output should show Succeeded for all nodes.
Step 2: Create an SriovNetwork
Create an SriovNetwork CR that references the resourceName from the policy. This generates a NetworkAttachmentDefinition that KubeVirt VMs can consume.
Note
With VFIO passthrough, the VF is passed directly into the guest VM. The host kernel does not see the network interface, so pod-level CNI IPAM cannot assign IPs to the VF. IP addresses must be configured inside the guest (e.g. via cloud-init or DHCP from an external server on the L2 network).
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: sriov-kubevirt-net
namespace: openshift-sriov-network-operator
spec:
resourceName: kubevirt_sriov
networkNamespace: default
spoofChk: "off"
trust: "on"
Verify the NetworkAttachmentDefinition was created:
oc get net-attach-def -n default sriov-kubevirt-net
Step 3: Create a VirtualMachine
Define a VirtualMachine with an sriov: {} interface pointing at the network attachment definition. Since IPAM is handled inside the guest, use cloud-init to configure a static IP on the SR-IOV interface.
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-sriov
namespace: default
spec:
runStrategy: Always
template:
spec:
domain:
devices:
interfaces:
- name: default
masquerade: {}
- name: sriov-net
sriov: {}
disks:
- name: containerdisk
disk: {bus: virtio}
- name: cloudinit
disk: {bus: virtio}
resources:
requests:
memory: "4Gi"
networks:
- name: default
pod: {}
- name: sriov-net
multus:
networkName: sriov-kubevirt-net
volumes:
- name: containerdisk
containerDisk:
image: quay.io/containerdisks/fedora:latest
- name: cloudinit
cloudInitNoCloud:
userData: |-
#cloud-config
password: password123
chpasswd: {expire: false}
ssh_pwauth: true
runcmd:
- |
for i in $(seq 1 30); do
SRIOV_IF=$(ls -1 /sys/class/net/ | grep -v ^lo$ | grep -v ^enp1s0$ | head -1)
[ -n "$SRIOV_IF" ] && break
sleep 1
done
if [ -n "$SRIOV_IF" ]; then
nmcli con add type ethernet ifname $SRIOV_IF con-name sriov \
ipv4.addresses 192.168.0.1/24 ipv4.method manual
nmcli con up sriov
fi
The sriov: {} interface type tells KubeVirt to pass the VF into the VM via VFIO. KubeVirt’s resource injector automatically adds the extended resource request (e.g. nvidia.com/kubevirt_sriov: "1") to the virt-launcher pod.
The cloud-init runcmd script waits for the SR-IOV interface to appear (the mlx5_core driver must load first), then configures a static IP using nmcli. Adjust the IP address for each VM accordingly.
Verification
Check the VMI is Running
oc get vmi vm-sriov
Verify the VF Inside the Guest
Connect to the VM console:
virtctl console vm-sriov
Inside the guest, verify the SR-IOV interface and IP configuration:
ip a
lspci | grep -i mellanox
Note
NVIDIA NICs require the mlx5_core driver inside the guest. If no network interface appears but lspci shows the device, run modprobe mlx5_core.
Guest Driver Note
NVIDIA VFs passed via VFIO require the mlx5_core driver inside the guest VM. If the guest image does not include it, you need to either:
Use a guest image with NVIDIA DOCA-OFED or inbox
mlx5_coredrivers pre-installedInstall the driver via cloud-init at first boot
Warning
Without the driver, the VF PCI device appears in lspci output but no network interface is created.
Limitations
- No live migration
VFIO passthrough gives the VM direct access to hardware PCI resources. Live migration is not possible because hardware state cannot be serialized.
- Host-side RDMA not available
deviceType: vfio-pciis incompatible withisRdma: trueon theSriovNetworkNodePolicy. RDMA works inside the guest VM becausemlx5_coreprovides both ethernet and RDMA capabilities. To enable RDMA inside the guest, install the required packages and load the kernel modules:sudo dnf install -y kernel-modules-extra-$(uname -r) rdma-core sudo modprobe ib_uverbs mlx5_ib
- IOMMU required
Nodes without IOMMU support cannot use VFIO passthrough. Run
virt-host-validate qemuon the worker nodes to check hardware virtualization and IOMMU:virt-host-validate qemu
All checks should show
PASS, includingChecking for device assignment IOMMU supportandChecking if IOMMU is enabled by kernel.Confirm IOMMU groups are populated:
ls /sys/kernel/iommu_groups/
If IOMMU checks fail, enable it on the worker nodes via kernel parameter (
intel_iommu=on iommu=ptoramd_iommu=on iommu=pt) and reboot.
Troubleshooting
VF Not Available on Node
Check node allocatable resources:
oc describe node <node> | grep kubevirt_sriov
Verify the VFs are bound to vfio-pci:
oc exec -n openshift-sriov-network-operator <config-daemon-pod> -- lspci -k -s <vf-pci-addr>
VMI Fails to Start
Check virt-launcher pod events:
oc describe pod virt-launcher-vm-sriov-xxxxx
Check KubeVirt logs:
oc logs virt-launcher-vm-sriov-xxxxx -c compute
Common causes:
IOMMU not enabled on the host
vfio-pcimodule not loadedNo VFs available (all allocated to other workloads)
No Network Interface in Guest
If lspci inside the guest shows the device but no interface appears in ip link:
modprobe mlx5_core
If the module is not available, install NVIDIA DOCA-OFED drivers in the guest image.