KubeVirt SR-IOV Integration

This guide explains how to attach SR-IOV Virtual Functions (VFs) to KubeVirt virtual machines using VFIO PCI passthrough.

With deviceType: vfio-pci, a VF’s PCI device is passed directly into the guest VM via the VFIO userspace interface. The VM gets near-native network performance because the data path bypasses the host kernel entirely.

Prerequisites

Install the Network Operator

Install the Network Operator with NFD and SR-IOV Network Operator enabled:

values.yaml:

nfd:
  enabled: true
sriovNetworkOperator:
  enabled: true
helm install network-operator nvidia/network-operator \
  -n nvidia-network-operator \
  --create-namespace \
  --version v26.4.0-beta.7 \
  -f values.yaml \
  --wait

Create a NicClusterPolicy

Once the Network Operator is installed, create a NicClusterPolicy with Multus CNI and CNI plugins:

apiVersion: mellanox.com/v1alpha1
kind: NicClusterPolicy
metadata:
  name: nic-cluster-policy
spec:
  secondaryNetwork:
    cniPlugins:
      image: plugins
      repository: nvcr.io/nvstaging/mellanox
      version: network-operator-v26.4.0-beta.7
      imagePullSecrets: []
    multus:
      image: multus-cni
      repository: nvcr.io/nvstaging/mellanox
      version: network-operator-v26.4.0-beta.7
      imagePullSecrets: []
kubectl apply -f nicclusterpolicy.yaml

Verify that the NicClusterPolicy is ready:

kubectl get nicclusterpolicy nic-cluster-policy

The state should show ready before proceeding.

Step 1: Create an SriovNetworkNodePolicy

Configure VFs with deviceType: vfio-pci. The operator creates the VFs and binds them to the vfio-pci driver, making them available as allocatable extended resources on the node.

Set isRdma: false (RDMA is not compatible with vfio-pci). The guest VM must have the mlx5_core kernel module available.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: kubevirt-policy
  namespace: nvidia-network-operator
spec:
  resourceName: kubevirt_sriov
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 8
  nicSelector:
    vendor: "15b3"
    pfNames:
      - ens1f0
  deviceType: vfio-pci
  isRdma: false

Wait for the policy to be applied:

kubectl get sriovnetworknodestates -n nvidia-network-operator -o jsonpath='{.items[*].status.syncStatus}'

The output should show Succeeded for all nodes.

Step 2: Create an SriovNetwork

Create an SriovNetwork CR that references the resourceName from the policy. This generates a NetworkAttachmentDefinition that KubeVirt VMs can consume.

Note

With VFIO passthrough, the VF is passed directly into the guest VM. The host kernel does not see the network interface, so pod-level CNI IPAM cannot assign IPs to the VF. IP addresses must be configured inside the guest (e.g. via cloud-init or DHCP from an external server on the L2 network).

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: sriov-kubevirt-net
  namespace: nvidia-network-operator
spec:
  resourceName: kubevirt_sriov
  networkNamespace: default
  spoofChk: "off"
  trust: "on"

Verify the NetworkAttachmentDefinition was created:

kubectl get net-attach-def -n default sriov-kubevirt-net

Step 3: Create a VirtualMachine

Define a VirtualMachine with an sriov: {} interface pointing at the network attachment definition. Since IPAM is handled inside the guest, use cloud-init to configure a static IP on the SR-IOV interface.

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm-sriov
  namespace: default
spec:
  runStrategy: Always
  template:
    spec:
      domain:
        devices:
          interfaces:
            - name: default
              masquerade: {}
            - name: sriov-net
              sriov: {}
          disks:
            - name: containerdisk
              disk: {bus: virtio}
            - name: cloudinit
              disk: {bus: virtio}
        resources:
          requests:
            memory: "4Gi"
      networks:
        - name: default
          pod: {}
        - name: sriov-net
          multus:
            networkName: sriov-kubevirt-net
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/containerdisks/fedora:latest
        - name: cloudinit
          cloudInitNoCloud:
            userData: |-
              #cloud-config
              password: password123
              chpasswd: {expire: false}
              ssh_pwauth: true
              runcmd:
                - |
                  for i in $(seq 1 30); do
                    SRIOV_IF=$(ls -1 /sys/class/net/ | grep -v ^lo$ | grep -v ^enp1s0$ | head -1)
                    [ -n "$SRIOV_IF" ] && break
                    sleep 1
                  done
                  if [ -n "$SRIOV_IF" ]; then
                    nmcli con add type ethernet ifname $SRIOV_IF con-name sriov \
                      ipv4.addresses 192.168.0.1/24 ipv4.method manual
                    nmcli con up sriov
                  fi

The sriov: {} interface type tells KubeVirt to pass the VF into the VM via VFIO. KubeVirt’s resource injector automatically adds the extended resource request (e.g. nvidia.com/kubevirt_sriov: "1") to the virt-launcher pod.

The cloud-init runcmd script waits for the SR-IOV interface to appear (the mlx5_core driver must load first), then configures a static IP using nmcli. Adjust the IP address for each VM accordingly.

Verification

Check the VMI is Running

kubectl get vmi vm-sriov

Verify the VF Inside the Guest

Connect to the VM console:

virtctl console vm-sriov

Inside the guest, verify the SR-IOV interface and IP configuration:

ip a
lspci | grep -i mellanox

Note

NVIDIA NICs require the mlx5_core driver inside the guest. If no network interface appears but lspci shows the device, run modprobe mlx5_core.

Guest Driver Note

NVIDIA VFs passed via VFIO require the mlx5_core driver inside the guest VM. If the guest image does not include it, you need to either:

  • Use a guest image with NVIDIA DOCA-OFED or inbox mlx5_core drivers pre-installed

  • Install the driver via cloud-init at first boot

Warning

Without the driver, the VF PCI device appears in lspci output but no network interface is created.

Limitations

No live migration

VFIO passthrough gives the VM direct access to hardware PCI resources. Live migration is not possible because hardware state cannot be serialized.

Host-side RDMA not available

deviceType: vfio-pci is incompatible with isRdma: true on the SriovNetworkNodePolicy. RDMA works inside the guest VM because mlx5_core provides both ethernet and RDMA capabilities. To enable RDMA inside the guest, install the required packages and load the kernel modules:

sudo dnf install -y kernel-modules-extra-$(uname -r) rdma-core
sudo modprobe ib_uverbs mlx5_ib
IOMMU required

Nodes without IOMMU support cannot use VFIO passthrough. Run virt-host-validate qemu on the worker nodes to check hardware virtualization and IOMMU:

virt-host-validate qemu

All checks should show PASS, including Checking for device assignment IOMMU support and Checking if IOMMU is enabled by kernel.

Confirm IOMMU groups are populated:

ls /sys/kernel/iommu_groups/

If IOMMU checks fail, enable it on the worker nodes via kernel parameter (intel_iommu=on iommu=pt or amd_iommu=on iommu=pt) and reboot.

Troubleshooting

VF Not Available on Node

Check node allocatable resources:

kubectl describe node <node> | grep kubevirt_sriov

Verify the VFs are bound to vfio-pci:

kubectl exec -n nvidia-network-operator <config-daemon-pod> -- lspci -k -s <vf-pci-addr>

VMI Fails to Start

Check virt-launcher pod events:

kubectl describe pod virt-launcher-vm-sriov-xxxxx

Check KubeVirt logs:

kubectl logs virt-launcher-vm-sriov-xxxxx -c compute

Common causes:

  • IOMMU not enabled on the host

  • vfio-pci module not loaded

  • No VFs available (all allocated to other workloads)

No Network Interface in Guest

If lspci inside the guest shows the device but no interface appears in ip link:

modprobe mlx5_core

If the module is not available, install NVIDIA DOCA-OFED drivers in the guest image.