Heterogeneous Clusters with NicNodePolicy

By default, the NVIDIA Network Operator uses a single NicClusterPolicy (NCP) resource to manage all NIC-related components cluster-wide. In heterogeneous clusters, where different groups of nodes require different DOCA-OFED driver versions or device plugin configurations, this single-policy model is not sufficient. NicNodePolicy (NNP) addresses this by allowing you to create multiple per-node-group policies, each targeting specific nodes via nodeSelector labels.

Warning

NicNodePolicy is recommended for new deployments only. There is no automated migration path to transition an existing NicClusterPolicy-only deployment to use NicNodePolicies. Migrating an existing cluster requires manually removing per-node sections from NicClusterPolicy and creating corresponding NicNodePolicy resources, which causes temporary disruption to affected components.

When to Use NicNodePolicy

For new deployments, NicNodePolicy is the recommended approach for managing per-node NIC components (DOCA-OFED driver, RDMA shared device plugin, SR-IOV device plugin), even in homogeneous clusters. Using NicNodePolicy from the start provides a consistent deployment model and makes it straightforward to support heterogeneous configurations in the future without re-architecting your policies.

NicNodePolicy is especially valuable when:

  • Different groups of nodes need different DOCA-OFED driver versions (e.g., GPU nodes on the latest DOCA release, storage nodes on an LTS release).

  • Different node groups need different SR-IOV device plugin configurations.

  • Different node groups need different RDMA shared device plugin configurations.

  • You need independent DOCA-OFED driver upgrade schedules per node group.

NicClusterPolicy alone is sufficient when you have an existing deployment that already manages per-node components through NCP and does not require heterogeneous configurations. Cluster-wide components such as Multus, NV-IPAM, and NIC Configuration Operator are always managed exclusively by NicClusterPolicy regardless of whether NicNodePolicy is used.

How NicClusterPolicy and NicNodePolicy Work Together

Overview

NicClusterPolicy is a singleton resource (always named nic-cluster-policy) that manages up to 12 component types cluster-wide. NicNodePolicy allows creating multiple instances, each targeting a specific set of nodes via nodeSelector, but only for a subset of 3 components.

The following table shows which components can be managed by each policy type. See the NicClusterPolicySpec and NicNodePolicySpec API references for full field details.

Component Ownership

Component

NCP

NNP

DOCA-OFED Driver

RDMA Shared Device Plugin

SR-IOV Device Plugin

Multus CNI

CNI Plugins

IPoIB CNI

NV-IPAM

NIC Configuration Operator

DOCA Telemetry Service

Spectrum-X Operator

IB Kubernetes

NIC Feature Discovery

Cluster-wide infrastructure components (Multus, NV-IPAM, NIC Configuration Operator, etc.) always remain in NicClusterPolicy. Per-node components (DOCA-OFED driver and device plugins) can be moved to NicNodePolicy instances to support heterogeneous configurations.

NCP-Only Deployment Flow

In a homogeneous cluster, NicClusterPolicy manages everything. All nodes run the same DOCA-OFED driver version and device plugin configuration:

apiVersion: mellanox.com/v1alpha1
kind: NicClusterPolicy
metadata:
  name: nic-cluster-policy
spec:
  ofedDriver:
    image: doca-driver
    repository: nvcr.io/nvstaging/mellanox
    version: doca3.4.0-26.04-0.6.1.0-0
    upgradePolicy:
      autoUpgrade: true
      maxParallelUpgrades: 1
  rdmaSharedDevicePlugin:
    image: k8s-rdma-shared-dev-plugin
    repository: nvcr.io/nvstaging/mellanox
    version: network-operator-v26.4.0-beta.7
    config: |
      {
        "configList": [
          {
            "resourceName": "rdma_shared_device_a",
            "rdmaHcaMax": 63,
            "selectors": {
              "vendors": ["15b3"]
            }
          }
        ]
      }
  secondaryNetwork:
    multus:
      image: multus-cni
      repository: nvcr.io/nvstaging/mellanox
      version: network-operator-v26.4.0-beta.7
  nvIpam:
    image: nvidia-k8s-ipam
    repository: nvcr.io/nvstaging/mellanox
    version: network-operator-v26.4.0-beta.7

NCP + NNP Deployment Flow

In a heterogeneous cluster, NicClusterPolicy retains only cluster-wide components. Per-node components move to NicNodePolicy instances, each targeting a specific node group:

# Cluster-wide components only
apiVersion: mellanox.com/v1alpha1
kind: NicClusterPolicy
metadata:
  name: nic-cluster-policy
spec:
  secondaryNetwork:
    multus:
      image: multus-cni
      repository: nvcr.io/nvstaging/mellanox
      version: network-operator-v26.4.0-beta.7
  nvIpam:
    image: nvidia-k8s-ipam
    repository: nvcr.io/nvstaging/mellanox
    version: network-operator-v26.4.0-beta.7
---
# GPU nodes: latest DOCA driver with RDMA
apiVersion: mellanox.com/v1alpha1
kind: NicNodePolicy
metadata:
  name: gpu-nodes
spec:
  nodeSelector:
    node-role.kubernetes.io/gpu: ""
  ofedDriver:
    image: doca-driver
    repository: nvcr.io/nvstaging/mellanox
    version: doca3.4.0-26.04-0.6.1.0-0
    upgradePolicy:
      autoUpgrade: true
      maxParallelUpgrades: 1
  rdmaSharedDevicePlugin:
    image: k8s-rdma-shared-dev-plugin
    repository: nvcr.io/nvstaging/mellanox
    version: network-operator-v26.4.0-beta.7
    config: |
      {
        "configList": [
          {
            "resourceName": "rdma_shared_device_a",
            "rdmaHcaMax": 63,
            "selectors": {
              "vendors": ["15b3"]
            }
          }
        ]
      }
---
# Storage nodes: LTS DOCA driver with SR-IOV
apiVersion: mellanox.com/v1alpha1
kind: NicNodePolicy
metadata:
  name: storage-nodes
spec:
  nodeSelector:
    node-role.kubernetes.io/storage: ""
  ofedDriver:
    image: doca-driver
    repository: nvcr.io/nvstaging/mellanox
    version: 24.10-0.7.0.0-0
    upgradePolicy:
      autoUpgrade: true
      maxParallelUpgrades: 2
  sriovDevicePlugin:
    image: sriov-network-device-plugin
    repository: nvcr.io/nvstaging/mellanox
    version: network-operator-v26.4.0-beta.7
    config: |
      {
        "resourceList": [
          {
            "resourcePrefix": "nvidia.com",
            "resourceName": "sriov_rdma",
            "selectors": {
              "vendors": ["15b3"],
              "isRdma": true
            }
          }
        ]
      }

Notice that the ofedDriver section is absent from NicClusterPolicy and present in both NicNodePolicy instances. Each NNP targets a distinct set of nodes with different DOCA-OFED versions and device plugin types.

DOCA-OFED Driver Upgrades with NicNodePolicy

Each NicNodePolicy with an ofedDriver section gets its own independent upgrade state manager. Upgrading the DOCA-OFED driver on one node group does not affect other node groups.

Key behaviors:

  • Each NicNodePolicy has a separate maxParallelUpgrades setting.

  • In maintenance-operator mode, each NicNodePolicy creates its own NodeMaintenance custom resources with a policy-specific requestor ID.

  • Upgrade progress is tracked independently per policy via the nvidia.com/ofed-driver-upgrade-state node label.

Note

For general DOCA-OFED driver upgrade configuration, see the Automatic DOCA-OFED Driver Upgrade section in the Life Cycle Management page.

Deployment Rules and Restrictions

Section Exclusivity

A given section (ofedDriver, rdmaSharedDevicePlugin, sriovDevicePlugin) can exist in either NicClusterPolicy or NicNodePolicy instances, but not both simultaneously. This rule is enforced by the admission webhook and prevents conflicting configurations.

For example:

VALID configuration:
  NicClusterPolicy:  secondaryNetwork, nvIpam
  NicNodePolicy "gpu-nodes":      ofedDriver, rdmaSharedDevicePlugin
  NicNodePolicy "storage-nodes":  ofedDriver, sriovDevicePlugin

INVALID configuration (rejected by webhook):
  NicClusterPolicy:  ofedDriver            <-- conflict
  NicNodePolicy "gpu-nodes":  ofedDriver   <-- same section in both

Note

Section exclusivity is enforced per section, not per NicNodePolicy instance. If any NicNodePolicy defines ofedDriver, then NicClusterPolicy must not define ofedDriver, and vice versa.

Node Selector Overlap Prevention

Two NicNodePolicy instances must not select overlapping sets of nodes. This is validated at two points:

  1. At admission time – the validating webhook resolves node selectors against actual cluster nodes and checks for intersection. If overlap is found, the create or update request is rejected.

  2. At runtime – the NicNodePolicy controller re-checks for overlap on every reconciliation to catch cases where nodes are re-labeled after the initial admission check. If overlap is detected, the affected NicNodePolicy status is set to Error with a message describing which nodes overlap, and no DaemonSet changes are applied until the overlap is resolved.

# VALID: non-overlapping node selectors
---
# NicNodePolicy "gpu-nodes"
spec:
  nodeSelector:
    node-role.kubernetes.io/gpu: ""
---
# NicNodePolicy "storage-nodes"
spec:
  nodeSelector:
    node-role.kubernetes.io/storage: ""
# INVALID: if any node has BOTH labels, overlap is detected
---
# NicNodePolicy "pool-a"
spec:
  nodeSelector:
    datacenter: us-east
---
# NicNodePolicy "pool-b"
spec:
  nodeSelector:
    rack: row-1

Note

The overlap check resolves selectors against the actual nodes in the cluster. Two NicNodePolicies with different label keys can still overlap if any node happens to match both selectors.

Naming Requirements

NicNodePolicy names must be at most 30 characters long. This limit exists because derived resource names (such as DaemonSet names and label values) incorporate the policy name and must stay within the Kubernetes 63-character label value limit.

Kubernetes short names for NicNodePolicy: nicnode, nnp.

Each NicNodePolicy creates uniquely-named DaemonSets by appending the policy name as a suffix:

DaemonSet Naming

Policy

DaemonSet Name Pattern

NicClusterPolicy

mofed-<os><version>-<hash>-ds

NicNodePolicy gpu-nodes

mofed-<os><version>-<hash>-gpu-nodes-ds

Prerequisites

The admission controller must be enabled for the section exclusivity and node overlap validation rules to be enforced. Set operator.admissionController.enabled to true in the Helm chart values. See Advanced Configurations for admission controller deployment details.

Verifying the Deployment

List all NicNodePolicy resources:

kubectl get nicnodepolicies

Check the status of a specific NicNodePolicy:

kubectl get nnp gpu-nodes -o yaml

A state: ready status indicates that the NicNodePolicy has been successfully applied. An state: error status with a reason field indicates a problem, such as a node selector overlap.

Verify the DaemonSets created by a NicNodePolicy:

kubectl get ds -n nvidia-network-operator -l nvidia.com/ofed-driver=

Check for policy errors across all NicNodePolicies:

kubectl get nnp -o custom-columns=NAME:.metadata.name,STATE:.status.state,REASON:.status.reason