KubeVirt SR-IOV Integration
On this page
This guide explains how to attach SR-IOV Virtual Functions (VFs) to KubeVirt virtual machines using VFIO PCI passthrough.
With deviceType: vfio-pci, a VF’s PCI device is passed directly into the guest VM via the VFIO userspace interface.
The VM gets near-native network performance because the data path bypasses the host kernel entirely.
Prerequisites
KubeVirt installed on the cluster:
Kubernetes: KubeVirt installation guide
Red Hat OpenShift: OpenShift Virtualization installation guide
IOMMU enabled on worker nodes:
Intel: kernel parameter
intel_iommu=on iommu=ptAMD: kernel parameter
amd_iommu=on iommu=pt
vfio-pcikernel module available on worker nodesvirtctlCLI tool installed:Kubernetes: virtctl client tool
Red Hat OpenShift: Installing virtctl on OpenShift
Install the Network Operator
Install the Network Operator with NFD and SR-IOV Network Operator enabled:
values.yaml:
nfd:
enabled: true
sriovNetworkOperator:
enabled: true
helm install network-operator nvidia/network-operator \
-n nvidia-network-operator \
--create-namespace \
--version v26.4.0-beta.7 \
-f values.yaml \
--wait
Create a NicClusterPolicy
Once the Network Operator is installed, create a NicClusterPolicy with Multus CNI and CNI plugins:
apiVersion: mellanox.com/v1alpha1
kind: NicClusterPolicy
metadata:
name: nic-cluster-policy
spec:
secondaryNetwork:
cniPlugins:
image: plugins
repository: nvcr.io/nvstaging/mellanox
version: network-operator-v26.4.0-beta.7
imagePullSecrets: []
multus:
image: multus-cni
repository: nvcr.io/nvstaging/mellanox
version: network-operator-v26.4.0-beta.7
imagePullSecrets: []
kubectl apply -f nicclusterpolicy.yaml
Verify that the NicClusterPolicy is ready:
kubectl get nicclusterpolicy nic-cluster-policy
The state should show ready before proceeding.
Step 1: Create an SriovNetworkNodePolicy
Configure VFs with deviceType: vfio-pci. The operator creates the VFs and binds them to the vfio-pci driver, making them available as allocatable extended resources on the node.
Set isRdma: false (RDMA is not compatible with vfio-pci). The guest VM must have the mlx5_core kernel module available.
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: kubevirt-policy
namespace: nvidia-network-operator
spec:
resourceName: kubevirt_sriov
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 8
nicSelector:
vendor: "15b3"
pfNames:
- ens1f0
deviceType: vfio-pci
isRdma: false
Wait for the policy to be applied:
kubectl get sriovnetworknodestates -n nvidia-network-operator -o jsonpath='{.items[*].status.syncStatus}'
The output should show Succeeded for all nodes.
Step 2: Create an SriovNetwork
Create an SriovNetwork CR that references the resourceName from the policy. This generates a NetworkAttachmentDefinition that KubeVirt VMs can consume.
Note
With VFIO passthrough, the VF is passed directly into the guest VM. The host kernel does not see the network interface, so pod-level CNI IPAM cannot assign IPs to the VF. IP addresses must be configured inside the guest (e.g. via cloud-init or DHCP from an external server on the L2 network).
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: sriov-kubevirt-net
namespace: nvidia-network-operator
spec:
resourceName: kubevirt_sriov
networkNamespace: default
spoofChk: "off"
trust: "on"
Verify the NetworkAttachmentDefinition was created:
kubectl get net-attach-def -n default sriov-kubevirt-net
Step 3: Create a VirtualMachine
Define a VirtualMachine with an sriov: {} interface pointing at the network attachment definition. Since IPAM is handled inside the guest, use cloud-init to configure a static IP on the SR-IOV interface.
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-sriov
namespace: default
spec:
runStrategy: Always
template:
spec:
domain:
devices:
interfaces:
- name: default
masquerade: {}
- name: sriov-net
sriov: {}
disks:
- name: containerdisk
disk: {bus: virtio}
- name: cloudinit
disk: {bus: virtio}
resources:
requests:
memory: "4Gi"
networks:
- name: default
pod: {}
- name: sriov-net
multus:
networkName: sriov-kubevirt-net
volumes:
- name: containerdisk
containerDisk:
image: quay.io/containerdisks/fedora:latest
- name: cloudinit
cloudInitNoCloud:
userData: |-
#cloud-config
password: password123
chpasswd: {expire: false}
ssh_pwauth: true
runcmd:
- |
for i in $(seq 1 30); do
SRIOV_IF=$(ls -1 /sys/class/net/ | grep -v ^lo$ | grep -v ^enp1s0$ | head -1)
[ -n "$SRIOV_IF" ] && break
sleep 1
done
if [ -n "$SRIOV_IF" ]; then
nmcli con add type ethernet ifname $SRIOV_IF con-name sriov \
ipv4.addresses 192.168.0.1/24 ipv4.method manual
nmcli con up sriov
fi
The sriov: {} interface type tells KubeVirt to pass the VF into the VM via VFIO. KubeVirt’s resource injector automatically adds the extended resource request (e.g. nvidia.com/kubevirt_sriov: "1") to the virt-launcher pod.
The cloud-init runcmd script waits for the SR-IOV interface to appear (the mlx5_core driver must load first), then configures a static IP using nmcli. Adjust the IP address for each VM accordingly.
Verification
Check the VMI is Running
kubectl get vmi vm-sriov
Verify the VF Inside the Guest
Connect to the VM console:
virtctl console vm-sriov
Inside the guest, verify the SR-IOV interface and IP configuration:
ip a
lspci | grep -i mellanox
Note
NVIDIA NICs require the mlx5_core driver inside the guest. If no network interface appears but lspci shows the device, run modprobe mlx5_core.
Guest Driver Note
NVIDIA VFs passed via VFIO require the mlx5_core driver inside the guest VM. If the guest image does not include it, you need to either:
Use a guest image with NVIDIA DOCA-OFED or inbox
mlx5_coredrivers pre-installedInstall the driver via cloud-init at first boot
Warning
Without the driver, the VF PCI device appears in lspci output but no network interface is created.
Limitations
- No live migration
VFIO passthrough gives the VM direct access to hardware PCI resources. Live migration is not possible because hardware state cannot be serialized.
- Host-side RDMA not available
deviceType: vfio-pciis incompatible withisRdma: trueon theSriovNetworkNodePolicy. RDMA works inside the guest VM becausemlx5_coreprovides both ethernet and RDMA capabilities. To enable RDMA inside the guest, install the required packages and load the kernel modules:sudo dnf install -y kernel-modules-extra-$(uname -r) rdma-core sudo modprobe ib_uverbs mlx5_ib
- IOMMU required
Nodes without IOMMU support cannot use VFIO passthrough. Run
virt-host-validate qemuon the worker nodes to check hardware virtualization and IOMMU:virt-host-validate qemu
All checks should show
PASS, includingChecking for device assignment IOMMU supportandChecking if IOMMU is enabled by kernel.Confirm IOMMU groups are populated:
ls /sys/kernel/iommu_groups/
If IOMMU checks fail, enable it on the worker nodes via kernel parameter (
intel_iommu=on iommu=ptoramd_iommu=on iommu=pt) and reboot.
Troubleshooting
VF Not Available on Node
Check node allocatable resources:
kubectl describe node <node> | grep kubevirt_sriov
Verify the VFs are bound to vfio-pci:
kubectl exec -n nvidia-network-operator <config-daemon-pod> -- lspci -k -s <vf-pci-addr>
VMI Fails to Start
Check virt-launcher pod events:
kubectl describe pod virt-launcher-vm-sriov-xxxxx
Check KubeVirt logs:
kubectl logs virt-launcher-vm-sriov-xxxxx -c compute
Common causes:
IOMMU not enabled on the host
vfio-pcimodule not loadedNo VFs available (all allocated to other workloads)
No Network Interface in Guest
If lspci inside the guest shows the device but no interface appears in ip link:
modprobe mlx5_core
If the module is not available, install NVIDIA DOCA-OFED drivers in the guest image.